⎕io=0 is assumed throughout.
A recent Forum post motivated investigations into the progressive index-of functions in the FinnAPL Idiom Library:
pix ← {((⍴⍺)⍴⍋⍋⍺⍳⍺,⍵) ⍳ ((⍴⍵)⍴⍋⍋⍺⍳⍵,⍺)} ⍝ FinnAPL Idiom 1
pixa ← {((⍋⍺⍳⍺,⍵)⍳⍳⍴⍺) ⍳ ((⍋⍺⍳⍵,⍺)⍳⍳⍴⍵)} ⍝ FinnAPL Idiom 5
In this note, we:
- explain what is progressive index-of
- explain why the two functions work
- investigate the performance of the two functions
- provide a more general solution
Progressive Index-Of
Progressive index-of is like index-of (⍳) except that each find “uses up” the target of that find. There are no duplicates in the result with the possible exception of ≢⍺ (for “not found”). Thus:
x←'mississippi'
y←'dismiss'
x pix y
11 1 2 0 4 3 5
The following chart illustrates a step-by-step derivation of each progressive index:
0 1 2 3 4 5 6 7 8 9 10
m i s s i s s i p p i d i s m i s s
11
m i s s i s s i p p i d i s m i s s
11 1
m i s s i s s i p p i d i s m i s s
11 1 2
m i s s i s s i p p i d i s m i s s
11 1 2 0
m i s s i s s i p p i d i s m i s s
11 1 2 0 4
m i s s i s s i p p i d i s m i s s
11 1 2 0 4 3
m i s s i s s i p p i d i s m i s s
11 1 2 0 4 3 5
It is possible to compute the progressive index without looping or recursion, as the two FinnAPL functions demonstrate.
Why It Works
The basic idea of ⍺ pix ⍵ is to substitute for each item of ⍺ and ⍵ an equivalent representative, collectively c and d, whence the result obtains as c⍳d. The equivalent representative used here is ranking, specifically the ranking of the indices in ⍺.
The ranking of an array ⍵ is a permutation of order ≢⍵. The smallest major cell is assigned 0; the next smallest is assigned 1; and so on. Ties are resolved by favoring the earlier-occurring cell. The ranking can be computed by ⍋⍋⍵. For example:
x ⍪ ⍉⍪ ⍋⍋x
m i s s i s s i p p i
4 0 7 8 1 9 10 2 5 6 3
y ⍪ ⍉⍪ ⍋⍋y
d i s m i s s
0 1 4 3 2 5 6
⍺ pix ⍵ works on two different rankings of indices in ⍺:
⍋⍋⍺⍳⍺,⍵ rankings of indices in ⍺ of ⍺ and ⍵, favoring ⍺
⍋⍋⍺⍳⍵,⍺ rankings of indices in ⍺ of ⍵ and ⍺, favoring ⍵
The first ⍴⍺ items of the former are those for ⍺ and the first ⍴⍵ of the latter are those for ⍵, and we get
pix ← {((⍴⍺)⍴⍋⍋⍺⍳⍺,⍵) ⍳ ((⍴⍵)⍴⍋⍋⍺⍳⍵,⍺)}
The second version depends on the following properties of permutations. Let p be a permutation. Then p[⍋p] ←→ ⍳≢p, the identity permutation, and therefore ⍋p is the inverse of p. Furthermore, p[p⍳⍳≢p] ←→ ⍳≢p and so p⍳⍳≢p is also the inverse of p. The inverse is unique (that’s why it’s called the inverse), therefore ⍋p ←→ p⍳⍳≢p.
p←97?97 ⍝ a random permutation
p[⍋p] ≡ ⍳≢p
1
p[p⍳⍳≢p] ≡ ⍳≢p
1
(⍋p) ≡ p⍳⍳≢p
1
The two rankings are permutations (because the leftmost functions are ⍋) and we just need the first ⍴⍺ items of the former and the first ⍴⍵ items of the latter. Thus:
pixa ← {((⍋⍺⍳⍺,⍵)⍳⍳⍴⍺) ⍳ ((⍋⍺⍳⍵,⍺)⍳⍳⍴⍵)}
Performance
We note that both versions of pix contain the expressions ⍺⍳⍺,⍵ and ⍺⍳⍵,⍺, but the latter is just a rotation of the former. Thus:
pixb ← {i←⍺⍳⍺,⍵ ⋄ ((⍴⍺)⍴⍋⍋i) ⍳ ((⍴⍵)⍴⍋⍋(⍴⍺)⌽i)}
pixc ← {i←⍺⍳⍺,⍵ ⋄ ((⍋i)⍳⍳⍴⍺) ⍳ ((⍋(⍴⍺)⌽i)⍳⍳⍴⍵)}
Which is faster? The answer may surprise.
x←?1e6⍴3e5
y←?2e5⍴3e5
cmpx 'x pixb y' 'x pixc y'
x pixb y → 9.15E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
x pixc y → 9.21E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
A few factors about the Dyalog APL interpreter are relevant to this performance:
- Computing
⍺⍳⍵,⍺as a rotation of an already computedi←⍺⍳⍺,⍵produces a worthwhile speed-up, although only on a relatively small part of the overall computation.i←x⍳x,y cmpx '(⍴x)⌽i' 'x⍳y,x' (⍴x)⌽i → 5.00E¯4 | 0% ⎕⎕ x⍳y,x → 7.19E¯3 | +1337% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ - Both
⍳and⍋have special code for small range data.s←?1e6⍴5e5 ⍝ small range t←s ⋄ t[t⍳⌈/t]←2e9 ⍝ large range cmpx 's⍳s' 't⍳t' s⍳s → 5.87E¯3 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕ t⍳t → 2.00E¯2 | +240% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ cmpx '⍋s' '⍋t' ⍋s → 3.25E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ ⍋t → 3.84E¯2 | +18% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ ⍋⍵has special code when⍵is a permutation.p←1e6?1e6 ⍝ p is a permutation q←p ⋄ q[999999]←⊃q ⍝ q is not; both are small-range cmpx '⍋p' '⍋q' ⍋p → 5.81E¯3 | 0% ⎕⎕⎕ * ⍋q → 5.71E¯2 | +882% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕- We saw previously that if
pis a permutation then⍋p ←→ p⍳⍳⍴p. The special code for⍋pmakes the two expressions run at roughly the same speed. The slight advantage for⍋⍋xversus(⍋x)⍳⍳⍴xwould increase if and when⍋⍋is recognized as an idiom.cmpx '⍋p' 'p⍳⍳⍴p' ⍋p → 6.02E¯3 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ p⍳⍳⍴p → 6.57E¯3 | +9% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ cmpx '⍋⍋x' '(⍋x)⍳⍳⍴x' ⍋⍋x → 3.16E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ (⍋x)⍳⍳⍴x → 3.25E¯2 | +2% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
A General Solution
Index-of works on cells rather than just scalars. Likewise, progressive index-of can also be extended to work on cells. The core algorithm remains the same. The generalization obtains by first reshaping ⍵ to have the same rank as ⍺ (having major cells with the same shape), applying the core algorithm, and then reshaping its result to have the same leading shape as the original ⍵. Thus:
pixd←{
m←≢⍺
r←0⌊1-⍴⍴⍺
n←×/r↓⍴⍵
i←⍺⍳⍺⍪(n,1↓⍴⍺)⍴⍵
(r↓⍴⍵) ⍴ ((⍋i)⍳⍳m) ⍳ ((⍋m⌽i)⍳⍳n)
}
xx yy
mmmm dddd
iiii iiii
ssss ssss
ssss mmmm
iiii iiii
ssss ssss
ssss ssss
iiii
pppp
pppp x
iiii mississippi
⍴xx ⍴yy y
11 4 7 4 dismiss
xx pixd yy x pixd y
11 1 2 0 4 3 5 11 1 2 0 4 3 5
xx pixd 3 5 4⍴yy x pixd 3 5⍴y
11 1 2 0 4 11 1 2 0 4
3 5 11 7 6 3 5 11 7 6
11 10 11 11 11 11 10 11 11 11
Postscript
After having written the above, I discovered an alternative exposition on progressive index-of by Bob Smith entitled Anatomy of an Idiom. Adám Brudzewsky has produced a Stack Exchange lesson and a Jupyter Notebook based on Smith’s text.
There is also an exposition in J on the same topic, with a more verbose but easier-to-understand derivation.

Follow