Interval Index

Roger Hui



0. Definition

Dyadic function: ⍺⍸⍵

The following definition borrows substantially from the definition of I. in J, implemented no later than 2006-01-12.

is duplicate-free and sorted in ascending order, and therefore partitions the domain into contiguous intervals from ¯∞ to . Ifhas the shape of a major cell of , then ⍺⍸⍵ is the largest j , the interval index, such that j⌷⍺ precedesin the ordering, or ⎕io-1 ifprecedes the first major cell ofor ifhas no major cells. In general, the search applies to the rank 0⌈(⍴⍴⍺)-1 cells of .

Since sorting is not ⎕ct- or ⎕dct-dependent then neither is . The result ofis ⎕io-dependent. This text is composed with ⎕io←1 .
 

1. Examples

1.1 A

   x ← ¯1 1 2 4 5.5
   y ← 0.5ׯ5+⍳18

   y ,[0.5] x⍸y
¯2 ¯1.5 ¯1 ¯0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5
 0  0    1  1   1 1   2 2   3 3   3 3   4 4   4 5   5 5  

The result derives as follows. Since x is sorted, it partitions the domain into contiguous intervals from ¯∞ to . (The intervals don’t intersect; that’s what “partition” means.) The interval index of a cell of y is the index of the interval that contains that cell. In the diagram below, ( or ) means a half-open interval excluding the endpoint; [ or ] means a half-closed interval including the endpoint.

This derivation is equivalent to what obtains from the language used in the definition (“the largest j such that j⌷⍺ precedes in the ordering”).

The diagram is a schematic only. The intervals are not necessarily equal in size and as depicted are not proportional to the endpoint values.

¯∞      ¯1        1        2        4        5.5     ∞
↓        ↓        ↓        ↓        ↓        ↓       ↓
(       )[       )[       )[       )[       )[       )
    0        1        2        3        4        5

The ( ) and [ ] notation used here follows conventional mathematical notation, but usesandinstead of < and to avoid confusion with ordering on the real numbers.

IntervalThe set of all x such that
(a,b)  ←→    a ≺ x ≺ b
(a,b]  ←→    a ≺ x ≼ b
[a,b)  ←→    a ≼ x ≺ b
[a,b]  ←→    a ≼ x ≼ b

x is required to be sorted but y is not. Moreover, the result for cells of y are computed independently. Therefore:

   (x⍸y)[p] ≡ x⍸y[p←?⍨≢y]
1
   (4 6⍴x⍸y) ≡ x⍸4 6⍴y
1

1.2 B

⍺⍸⍵ looks for major cells ofand therefore imposes requirements on the rank and (trailing) shape of .

   x ← 4 2 3 ⍴ 5    
   ⍝ right argument required to have trailing shape 2 3

   x ⍸ 5
RANK ERROR
   x⍸5
  ^

   x ⍸ 3⍴5
RANK ERROR
   x⍸3⍴5
  ^

   x ⍸ 3 2⍴5
LENGTH ERROR
   x⍸3 2⍴5
  ^

   3 ⍸ 2 6 1
RANK ERROR
   3⍸2 6 1
  ∧

1.3 C

   x ← ↑ 'Fi' 'Jay' 'John' 'Morten' 'Roger'
   x
Fi
Jay   
John  
Morten
Roger 
   ⍴x
5 6
   y ← x ⍪ ↑ 'JD' 'Jd' 'Geoff' 'Alpha' 'Omega' 'Zeus  '
   y
Fi
Jay   
John  
Morten
Roger 
JD    
Jd    
Geoff 
Alpha 
Omega 
Zeus  

   x ⍸ y
1 2 3 4 5 1 2 1 0 4 5
   y ,⍪ x⍸y
Fi     1
Jay    2
John   3
Morten 4
Roger  5
JD     1
Jd     2
Geoff  1
Alpha  0
Omega  4
Zeus   5

   x⍳x
1 2 3 4 5
   x⍸x
1 2 3 4 5
¯∞       Fi       Jay      John     Morten   Roger   ∞
↓        ↓        ↓        ↓        ↓        ↓       ↓
(       )[       )[       )[       )[       )[       )
    0        1        2        3        4        5
  Alpha    Fi        Jay     John     Morten   Roger
           JD        Jd               Omega    Zeus
           Geoff      
   x1 ← 'Fi' 'Jay' 'John' 'Morten' 'Roger'
   y1 ← x1⍪'JD' 'Jd' 'Geoff' 'Alpha' 'Omega' 'Zeus'
   x1
┌──┬───┬────┬──────┬─────┐
│Fi│Jay│John│Morten│Roger│
└──┴───┴────┴──────┴─────┘
   y1
┌──┬───┬────┬──────┬─────┬──┬──┬─────┬─────┬─────┬────┐
│Fi│Jay│John│Morten│Roger│JD│Jd│Geoff│Alpha│Omega│Zeus│
└──┴───┴────┴──────┴─────┴──┴──┴─────┴─────┴─────┴────┘
   x1 ⍸ y1    ⍝ this requires an extension to ⍋
1 2 3 4 5 1 2 1 0 4 5

   x1 ⍳ x1
1 2 3 4 5
   x1 ⍸ x1
1 2 3 4 5

1.4 D

From Sixteen Amuse-Bouches in APL F.

An illustration of the central limit theorem, that the sum of independent random variables converges to the normal distribution.

   t ← ¯1+{≢⍵}⌸(⍳41),(5×⍳40)⍸+⌿?10 1e6⍴21
   ⍴t
41
   5 8⍴t
    0     0     0     0      0     13    28    90
  317   894  2095  4574   8671  15001 24338 36728
51254 66804 82787 93943 101045 101752 96510 85281
70418 54506 39802 27267  16964   9764  5031  2467
 1059   422   136    32      6      1     0     0

t counts the number of occurrences in the intervals with endpoints 5×⍳40 , of 1e6 samples from the sum of ten repetitions of ?21 . A plot of t :

1.5 E

t are timestamps and x are corresponding data, in chronological order.

   ⍴t
200000 3
   ⍴x
200000

   (10↑t) (¯10↑t)
┌─────┬────────┐
│0 0 0│23 59 54│
│0 0 0│23 59 55│
│0 0 0│23 59 56│
│0 0 0│23 59 56│
│0 0 0│23 59 58│
│0 0 2│23 59 58│
│0 0 3│23 59 59│
│0 0 3│23 59 59│
│0 0 4│23 59 59│
│0 0 5│23 59 59│
└─────┴────────┘
   x
3984300 2020650 819000 1677100 3959200 2177250 3431800 ...

u are timestamps for 5-minute intervals:

   ⍴u
288 3
   (10↑u) (¯10↑u)
┌──────┬───────┐
│0  0 0│23 10 0│
│0  5 0│23 15 0│
│0 10 0│23 20 0│
│0 15 0│23 25 0│
│0 20 0│23 30 0│
│0 25 0│23 35 0│
│0 30 0│23 40 0│
│0 35 0│23 45 0│
│0 40 0│23 50 0│
│0 45 0│23 55 0│
└──────┴───────┘

Therefore, the expression (u⍸t){+/⍵}⌸x summarizes x in 5-minute intervals.

   u ⍸ t
1 1 1 1 1 1 1 1 1 1 ... 288 288 288 288 288 288

   (u⍸t) {+/⍵}⌸ x
1339083050 1365108650 1541944750 1393476000 1454347100 ...

   (u⍸t) {(⍺⌷u),+/⍵}⌸ x
 0  0 0 1339083050
 0  5 0 1365108650
 0 10 0 1541944750
 0 15 0 1393476000
   ...
23 45 0 1388823150
23 50 0 1453472350
23 55 0 1492078850

2. Model

I ←{(r↓⍴⍵)⍴(≢⍺)↓i⊣i[i]←+\(1+≢⍺)>i←⍋⍺⍪,[r↓⍳⍴⍴⍵]⍵⊣r←1-⍴⍴⍺}
Ia←{(r↓⍴⍵)⍴(≢⍺)↓(⊂⍋i)⌷ +\(1+≢⍺)>i←⍋⍺⍪,[r↓⍳⍴⍴⍵]⍵⊣r←1-⍴⍴⍺}

A version of I first appeared in section 1.2 of Some Uses of { and } in 1987. Ia is to illustrate that the curious construct i[i]←x is just a faster way to rearrange x according to the inverse of permutation i .

The actual implementation of the primitive would likely use a binary search rather than gradingandin their entirety, especially whenis large relative to .

2.1 Tests

assert←{⍺←'assertion failure' ⋄ 0∊⍵:⍺ ⎕signal 8 ⋄ shy←0}
Err   ←{0::⎕io⊃⎕dm ⋄ 0=⎕nc '⍺':⍺⍺ ⍵ ⋄ ⍺ ⍺⍺ ⍵}

 z←test_interval_index io;⎕io;i;p;x;y
⍝ tests on monadic ⍸

 ⎕io←io

 x←¯1 2 3 7 8.5
 y←0.5×(¯3-⎕io)+⍳24
 i←x⍸y
 assert i≡(~⎕io)-⍨+⌿x∘.≤y
 assert i[p]≡(x⍸y)[p←?⍨≢y]
 assert(2 3 4⍴i)≡x⍸2 3 4⍴y
 assert i≡(2/⍪x)⍸2/⍪y
 assert(x⍳x)[p]≡x⍸x[p←?4 3⍴≢x]

 x←'eiou'
 y←⎕A[?23⍴26]
 i←x⍸y
 assert i≡(~⎕io)-⍨+⌿(⎕A⍳x)∘.≤(⎕A⍳y)
 assert i[p]≡(x⍸y)[p←?⍨≢y]
 assert(2 3 4⍴i)≡x⍸2 3 4⍴y
 assert i≡(2/⍪x)⍸2/⍪y
 assert(x⍳x)[p]≡x⍸x[p←?4 3⍴≢x]

 assert'LENGTH ERROR'≡(3 4⍴⍳12)⍸Err 3 2⍴⍳6
 assert'RANK ERROR'≡(4 2 3⍴5)⍸Err⍳3

 z←1⊣2 ⎕NQ'.' 'wscheck'

3. Symbol

It is proposed that interval index be the dyadic meaning of (whose monadic meaning is where, {(,⍵)/,⍳⍴~~⍵}). This happens to be the same pairing in J.

Other contenders:

•  NARS2000 uses dyadic to mean what dyadicmeans in Dyalog (look for major cells).
Some APLs (e.g. Dictionary APL) use to mean ⍸⍷ (or ⍸⍷⍨), iota underbar atop epsilon underbar, the indices of subarray match.

 

4. Notes

4.1 TAO

Extensions to the Total Array Ordering are not required but would makesnicer. The most important is the extension to nested arrays.

4.2 Sort Idioms

This may be a good time to implement the following new idioms:

{(⊂⍋⍵)⌷⍵}      {⍵⌷⍨⊂⍋⍵}
{(⊂⍒⍵)⌷⍵}      {⍵⌷⍨⊂⍒⍵}

Together with extensions to the Total Array Ordering these idioms would “sort anything”.

It was a mistake an infelicity to have implemented

{⍵[⍋⍵]}        {⍵[⍋⍵;]}  
{⍵[⍒⍵]}        {⍵[⍒⍵;]}

in 2008 instead of the more general versions here using . Anyway we can have all eight of these idioms.

4.3 Relationship to

The conformability requirements and the result shape for are the same as those for :

(1↓⍴⍺) ≡ (1-⍴⍴⍺)↑⍴⍵    

(⍴⍺⍸⍵) ≡ (1-⍴⍴⍺)↓⍴⍵
(⍴⍺⍳⍵) ≡ (1-⍴⍴⍺)↓⍴⍵

The first expression asserts that cells of are required to have same shape as a major cell of . The last two expressions assert that the shape of the result (of ⍺⍳⍵ and of ⍺⍸⍵) is the shape ofless the cell shape.

If y∊x (and x is sorted and ⎕ct=0), then x⍸y is the same as x⍳y . For example:

   x←{⍵⌷⍨⊂⍋⍵} 0.1ׯ40+?61⍴100
   y←x[?5 7⍴≢x]

   (x⍸y) ≡ x⍳y
1

4.4 Unavoidable Asymmetry

is an ordered set of n←≢⍺ endpoints which partitions the domain into n+1 intervals. The following are true:

•  The intervals do not intersect.
The intervals are contiguous (the union of them is the entire domain).
Interval indices are integers.
The interval index for the next adjacent interval is 1+ that of the current one.

Consequently, it is not possible to be more symmetric than cases A and B below:

¯∞    1⌷⍺   2⌷⍺                    n⌷⍺  ∞
↓     ↓     ↓                      ↓    ↓
(    )[    )[    ) ... [    )[    )[    )   A  ← our choice
(    ](    ](    ] ... (    ](    ](    )   B
(    )[    ](    ] ... (    ](    ](    )   C
(    )[    )[    ) ... [    )[    ](    )   D

Consequently, either n⌷⍺ has the same index as all x≽n⌷⍺ (case A), or 1⌷⍺ has the same index as all x≼1⌷⍺ (case B). The current proposal chooses case A and assigns ⎕io-1 as the first index. It has the advantage that ⍺⍸⍵ is the same as ⍺⍳⍵ if ⍵∊⍺ and ⎕ct=0 and is duplicate-free and sorted.
 


created:  2016-07-19 15:25
updated: