Interval Index 0. Definition Dyadic function: ⍺⍸⍵ The following definition borrows substantially from the definition of I. in J, implemented no later than 2006-01-12. ⍺ is duplicate-free and sorted in ascending order, and therefore partitions the domain into contiguous intervals from ¯∞ to ∞ . If ⍵ has the shape of a major cell of ⍺ , then ⍺⍸⍵ is the largest j , the interval index, such that j⌷⍺ precedes ⍵ in the ordering, or ⎕io-1 if ⍵ precedes the first major cell of ⍺ or if ⍺ has no major cells. In general, the search applies to the rank 0⌈(⍴⍴⍺)-1 cells of ⍵ . Since sorting is not ⎕ct- or ⎕dct-dependent then neither is ⍸ .
The result of ⍸ is ⎕io-dependent.
This text is composed with ⎕io←1 .
1. Examples 1.1 A x ← ¯1 1 2 4 5.5 y ← 0.5ׯ5+⍳18 y ,[0.5] x⍸y ¯2 ¯1.5 ¯1 ¯0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 0 0 1 1 1 1 2 2 3 3 3 3 4 4 4 5 5 5 The result derives as follows. Since x is sorted, it partitions the domain into contiguous intervals from ¯∞ to ∞ . (The intervals don’t intersect; that’s what “partition” means.) The interval index of a cell of y is the index of the interval that contains that cell. In the diagram below, ( or ) means a half-open interval excluding the endpoint; [ or ] means a half-closed interval including the endpoint. This derivation is equivalent to what obtains from the language used in the definition (“the largest j such that j⌷⍺ precedes ⍵ in the ordering”). The diagram is a schematic only. The intervals are not necessarily equal in size and as depicted are not proportional to the endpoint values. ¯∞ ¯1 1 2 4 5.5 ∞ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ( )[ )[ )[ )[ )[ ) 0 1 2 3 4 5 The ( ) and [ ] notation used here follows conventional mathematical notation, but uses ≺ and ≼ instead of < and ≤ to avoid confusion with ordering on the real numbers.
x is required to be sorted but y is not. Moreover, the result for cells of y are computed independently. Therefore: (x⍸y)[p] ≡ x⍸y[p←?⍨≢y] 1 (4 6⍴x⍸y) ≡ x⍸4 6⍴y 1 1.2 B ⍺⍸⍵ looks for major cells of ⍺ and therefore imposes requirements on the rank and (trailing) shape of ⍵ . x ← 4 2 3 ⍴ 5 ⍝ right argument required to have trailing shape 2 3 x ⍸ 5 RANK ERROR x⍸5 ^ x ⍸ 3⍴5 RANK ERROR x⍸3⍴5 ^ x ⍸ 3 2⍴5 LENGTH ERROR x⍸3 2⍴5 ^ 3 ⍸ 2 6 1 RANK ERROR 3⍸2 6 1 ∧ 1.3 C x ← ↑ 'Fi' 'Jay' 'John' 'Morten' 'Roger' x Fi Jay John Morten Roger ⍴x 5 6 y ← x ⍪ ↑ 'JD' 'Jd' 'Geoff' 'Alpha' 'Omega' 'Zeus ' y Fi Jay John Morten Roger JD Jd Geoff Alpha Omega Zeus x ⍸ y 1 2 3 4 5 1 2 1 0 4 5 y ,⍪ x⍸y Fi 1 Jay 2 John 3 Morten 4 Roger 5 JD 1 Jd 2 Geoff 1 Alpha 0 Omega 4 Zeus 5 x⍳x 1 2 3 4 5 x⍸x 1 2 3 4 5 ¯∞ Fi Jay John Morten Roger ∞ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ( )[ )[ )[ )[ )[ ) 0 1 2 3 4 5 Alpha Fi Jay John Morten Roger JD Jd Omega Zeus Geoff x1 ← 'Fi' 'Jay' 'John' 'Morten' 'Roger' y1 ← x1⍪'JD' 'Jd' 'Geoff' 'Alpha' 'Omega' 'Zeus' x1 ┌──┬───┬────┬──────┬─────┐ │Fi│Jay│John│Morten│Roger│ └──┴───┴────┴──────┴─────┘ y1 ┌──┬───┬────┬──────┬─────┬──┬──┬─────┬─────┬─────┬────┐ │Fi│Jay│John│Morten│Roger│JD│Jd│Geoff│Alpha│Omega│Zeus│ └──┴───┴────┴──────┴─────┴──┴──┴─────┴─────┴─────┴────┘ x1 ⍸ y1 ⍝ this requires an extension to ⍋ 1 2 3 4 5 1 2 1 0 4 5 x1 ⍳ x1 1 2 3 4 5 x1 ⍸ x1 1 2 3 4 5 1.4 D From Sixteen Amuse-Bouches in APL F. An illustration of the central limit theorem, that the sum of independent random variables converges to the normal distribution. t ← ¯1+{≢⍵}⌸(⍳41),(5×⍳40)⍸+⌿?10 1e6⍴21 ⍴t 41 5 8⍴t 0 0 0 0 0 13 28 90 317 894 2095 4574 8671 15001 24338 36728 51254 66804 82787 93943 101045 101752 96510 85281 70418 54506 39802 27267 16964 9764 5031 2467 1059 422 136 32 6 1 0 0 t counts the number of occurrences in the intervals with endpoints 5×⍳40 , of 1e6 samples from the sum of ten repetitions of ?21 . A plot of t : ![]() 1.5 E t are timestamps and x are corresponding data, in chronological order. ⍴t 200000 3 ⍴x 200000 (10↑t) (¯10↑t) ┌─────┬────────┐ │0 0 0│23 59 54│ │0 0 0│23 59 55│ │0 0 0│23 59 56│ │0 0 0│23 59 56│ │0 0 0│23 59 58│ │0 0 2│23 59 58│ │0 0 3│23 59 59│ │0 0 3│23 59 59│ │0 0 4│23 59 59│ │0 0 5│23 59 59│ └─────┴────────┘ x 3984300 2020650 819000 1677100 3959200 2177250 3431800 ... u are timestamps for 5-minute intervals: ⍴u 288 3 (10↑u) (¯10↑u) ┌──────┬───────┐ │0 0 0│23 10 0│ │0 5 0│23 15 0│ │0 10 0│23 20 0│ │0 15 0│23 25 0│ │0 20 0│23 30 0│ │0 25 0│23 35 0│ │0 30 0│23 40 0│ │0 35 0│23 45 0│ │0 40 0│23 50 0│ │0 45 0│23 55 0│ └──────┴───────┘ Therefore, the expression (u⍸t){+/⍵}⌸x summarizes x in 5-minute intervals. u ⍸ t 1 1 1 1 1 1 1 1 1 1 ... 288 288 288 288 288 288 (u⍸t) {+/⍵}⌸ x 1339083050 1365108650 1541944750 1393476000 1454347100 ... (u⍸t) {(⍺⌷u),+/⍵}⌸ x 0 0 0 1339083050 0 5 0 1365108650 0 10 0 1541944750 0 15 0 1393476000 ... 23 45 0 1388823150 23 50 0 1453472350 23 55 0 1492078850 2. Model I ←{(r↓⍴⍵)⍴(≢⍺)↓i⊣i[i]←+\(1+≢⍺)>i←⍋⍺⍪,[r↓⍳⍴⍴⍵]⍵⊣r←1-⍴⍴⍺} Ia←{(r↓⍴⍵)⍴(≢⍺)↓(⊂⍋i)⌷ +\(1+≢⍺)>i←⍋⍺⍪,[r↓⍳⍴⍴⍵]⍵⊣r←1-⍴⍴⍺} A version of I first appeared in section 1.2 of Some Uses of { and } in 1987. Ia is to illustrate that the curious construct i[i]←x is just a faster way to rearrange x according to the inverse of permutation i . The actual implementation of the primitive would likely use a binary search rather than grading ⍺ and ⍵ in their entirety, especially when ⍺ is large relative to ⍵ . 2.1 Tests assert←{⍺←'assertion failure' ⋄ 0∊⍵:⍺ ⎕signal 8 ⋄ shy←0} Err ←{0::⎕io⊃⎕dm ⋄ 0=⎕nc '⍺':⍺⍺ ⍵ ⋄ ⍺ ⍺⍺ ⍵} z←test_interval_index io;⎕io;i;p;x;y ⍝ tests on monadic ⍸ ⎕io←io x←¯1 2 3 7 8.5 y←0.5×(¯3-⎕io)+⍳24 i←x⍸y assert i≡(~⎕io)-⍨+⌿x∘.≤y assert i[p]≡(x⍸y)[p←?⍨≢y] assert(2 3 4⍴i)≡x⍸2 3 4⍴y assert i≡(2/⍪x)⍸2/⍪y assert(x⍳x)[p]≡x⍸x[p←?4 3⍴≢x] x←'eiou' y←⎕A[?23⍴26] i←x⍸y assert i≡(~⎕io)-⍨+⌿(⎕A⍳x)∘.≤(⎕A⍳y) assert i[p]≡(x⍸y)[p←?⍨≢y] assert(2 3 4⍴i)≡x⍸2 3 4⍴y assert i≡(2/⍪x)⍸2/⍪y assert(x⍳x)[p]≡x⍸x[p←?4 3⍴≢x] assert'LENGTH ERROR'≡(3 4⍴⍳12)⍸Err 3 2⍴⍳6 assert'RANK ERROR'≡(4 2 3⍴5)⍸Err⍳3 z←1⊣2 ⎕NQ'.' 'wscheck' 3. Symbol It is proposed that interval index be the dyadic meaning of ⍸ (whose monadic meaning is where, {(,⍵)/,⍳⍴~~⍵}). This happens to be the same pairing in J. Other contenders:
4. Notes 4.1 TAO Extensions to the Total Array Ordering are not required but would makes ⍸ nicer. The most important is the extension to nested arrays. 4.2 Sort Idioms This may be a good time to implement the following new idioms: {(⊂⍋⍵)⌷⍵} {⍵⌷⍨⊂⍋⍵} {(⊂⍒⍵)⌷⍵} {⍵⌷⍨⊂⍒⍵} Together with extensions to the Total Array Ordering these idioms would “sort anything”. It was {⍵[⍋⍵]} {⍵[⍋⍵;]} {⍵[⍒⍵]} {⍵[⍒⍵;]} in 2008 instead of the more general versions here using ⌷ . Anyway we can have all eight of these idioms. 4.3 Relationship to ⍳ The conformability requirements and the result shape for ⍸ are the same as those for ⍳ : (1↓⍴⍺) ≡ (1-⍴⍴⍺)↑⍴⍵ (⍴⍺⍸⍵) ≡ (1-⍴⍴⍺)↓⍴⍵ (⍴⍺⍳⍵) ≡ (1-⍴⍴⍺)↓⍴⍵ The first expression asserts that cells of ⍵ are required to have same shape as a major cell of ⍺ . The last two expressions assert that the shape of the result (of ⍺⍳⍵ and of ⍺⍸⍵) is the shape of ⍵ less the cell shape. If y∊x (and x is sorted and ⎕ct=0), then x⍸y is the same as x⍳y . For example: x←{⍵⌷⍨⊂⍋⍵} 0.1ׯ40+?61⍴100 y←x[?5 7⍴≢x] (x⍸y) ≡ x⍳y 1 4.4 Unavoidable Asymmetry ⍺ is an ordered set of n←≢⍺ endpoints which partitions the domain into n+1 intervals. The following are true:
Consequently, it is not possible to be more symmetric than cases A and B below: ¯∞ 1⌷⍺ 2⌷⍺ n⌷⍺ ∞ ↓ ↓ ↓ ↓ ↓ ( )[ )[ ) ... [ )[ )[ ) A ← our choice ( ]( ]( ] ... ( ]( ]( ) B ( )[ ]( ] ... ( ]( ]( ) C ( )[ )[ ) ... [ )[ ]( ) D Consequently, either n⌷⍺ has the same index as
all x≽n⌷⍺ (case A),
or 1⌷⍺ has the same index as
all x≼1⌷⍺ (case B).
The current proposal chooses case A and assigns ⎕io-1 as the first index.
It has the advantage that ⍺⍸⍵
is the same as ⍺⍳⍵ if ⍵∊⍺ and ⎕ct=0 and ⍺
is duplicate-free and sorted.
|