# Progressive Index-Of

`⎕io=0 `is assumed throughout.

A recent Forum post motivated investigations into the progressive index-of functions in the FinnAPL Idiom Library:

``````pix  ← {((⍴⍺)⍴⍋⍋⍺⍳⍺,⍵) ⍳ ((⍴⍵)⍴⍋⍋⍺⍳⍵,⍺)}   ⍝ FinnAPL Idiom 1
pixa ← {((⍋⍺⍳⍺,⍵)⍳⍳⍴⍺) ⍳ ((⍋⍺⍳⍵,⍺)⍳⍳⍴⍵)}   ⍝ FinnAPL Idiom 5
``````

In this note, we:

• explain what is progressive index-of
• explain why the two functions work
• investigate the performance of the two functions
• provide a more general solution

## Progressive Index-Of

Progressive index-of is like index-of (`⍳`) except that each find “uses up” the target of that find. There are no duplicates in the result with the possible exception of `≢⍺` (for “not found”). Thus:

``````      x←'mississippi'
y←'dismiss'

x pix y
11 1 2 0 4 3 5
``````

The following chart illustrates a step-by-step derivation of each progressive index:

``````0 1 2 3 4 5 6 7 8 9 10

m i s s i s s i p p  i      d i s m i s s
11
m i s s i s s i p p  i      d i s m i s s
11 1
m i s s i s s i p p  i      d i s m i s s
11 1 2
m i s s i s s i p p  i      d i s m i s s
11 1 2 0
m i s s i s s i p p  i      d i s m i s s
11 1 2 0 4
m i s s i s s i p p  i      d i s m i s s
11 1 2 0 4 3
m i s s i s s i p p  i      d i s m i s s
11 1 2 0 4 3 5
``````

It is possible to compute the progressive index without looping or recursion, as the two FinnAPL functions demonstrate.

## Why It Works

The basic idea of `⍺ pix ⍵` is to substitute for each item of `⍺` and `⍵` an equivalent representative, collectively `c` and `d`, whence the result obtains as `c⍳d`. The equivalent representative used here is ranking, specifically the ranking of the indices in `⍺`.

The ranking of an array `⍵` is a permutation of order `≢⍵`. The smallest major cell is assigned 0; the next smallest is assigned 1; and so on. Ties are resolved by favoring the earlier-occurring cell. The ranking can be computed by `⍋⍋⍵`. For example:

``````      x ⍪ ⍉⍪ ⍋⍋x
m i s s i s  s i p p i
4 0 7 8 1 9 10 2 5 6 3

y ⍪ ⍉⍪ ⍋⍋y
d i s m i s s
0 1 4 3 2 5 6
``````

`⍺ pix ⍵` works on two different rankings of indices in `⍺`:

`⍋⍋⍺⍳⍺,⍵    `rankings of indices in `⍺` of `⍺` and `⍵`, favoring `⍺`
`⍋⍋⍺⍳⍵,⍺    `rankings of indices in `⍺` of `⍵` and `⍺`, favoring `⍵`

The first `⍴⍺` items of the former are those for `⍺` and the first `⍴⍵` of the latter are those for `⍵`, and we get

``pix ← {((⍴⍺)⍴⍋⍋⍺⍳⍺,⍵) ⍳ ((⍴⍵)⍴⍋⍋⍺⍳⍵,⍺)}``

The second version depends on the following properties of permutations. Let `p` be a permutation. Then `p[⍋p] ←→ ⍳≢p`, the identity permutation, and therefore `⍋p` is the inverse of `p`. Furthermore, `p[p⍳⍳≢p] ←→ ⍳≢p` and so `p⍳⍳≢p` is also the inverse of `p`. The inverse is unique (that’s why it’s called the inverse), therefore `⍋p ←→ p⍳⍳≢p`.

``````      p←97?97         ⍝ a random permutation

p[⍋p]    ≡ ⍳≢p
1
p[p⍳⍳≢p] ≡ ⍳≢p
1
(⍋p)     ≡ p⍳⍳≢p
1
``````

The two rankings are permutations (because the leftmost functions are `⍋`) and we just need the first `⍴⍺` items of the former and the first `⍴⍵` items of the latter. Thus:

``````pixa ← {((⍋⍺⍳⍺,⍵)⍳⍳⍴⍺) ⍳ ((⍋⍺⍳⍵,⍺)⍳⍳⍴⍵)}
``````

## Performance

We note that both versions of `pix` contain the expressions `⍺⍳⍺,⍵` and `⍺⍳⍵,⍺`, but the latter is just a rotation of the former. Thus:

``````pixb ← {i←⍺⍳⍺,⍵ ⋄ ((⍴⍺)⍴⍋⍋i) ⍳ ((⍴⍵)⍴⍋⍋(⍴⍺)⌽i)}
pixc ← {i←⍺⍳⍺,⍵ ⋄ ((⍋i)⍳⍳⍴⍺) ⍳ ((⍋(⍴⍺)⌽i)⍳⍳⍴⍵)}
``````

Which is faster? The answer may surprise.

``````      x←?1e6⍴3e5
y←?2e5⍴3e5

cmpx 'x pixb y' 'x pixc y'
x pixb y → 9.15E¯2 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
x pixc y → 9.21E¯2 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
``````

A few factors about the Dyalog APL interpreter are relevant to this performance:

• Computing `⍺⍳⍵,⍺` as a rotation of an already computed `i←⍺⍳⍺,⍵` produces a worthwhile speed-up, although only on a relatively small part of the overall computation.
``````      i←x⍳x,y
cmpx '(⍴x)⌽i' 'x⍳y,x'
(⍴x)⌽i → 5.00E¯4 |     0% ⎕⎕
x⍳y,x  → 7.19E¯3 | +1337% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
``````
• Both `⍳` and `⍋` have special code for small range data.
``````      s←?1e6⍴5e5           ⍝ small range
t←s ⋄ t[t⍳⌈/t]←2e9   ⍝ large range

cmpx 's⍳s' 't⍳t'
s⍳s → 5.87E¯3 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕
t⍳t → 2.00E¯2 | +240% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

cmpx '⍋s' '⍋t'
⍋s → 3.25E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
⍋t → 3.84E¯2 | +18% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
``````
• `⍋⍵` has special code when `⍵` is a permutation.
``````      p←1e6?1e6           ⍝ p is a permutation
q←p ⋄ q[999999]←⊃q  ⍝ q is not; both are small-range

cmpx '⍋p' '⍋q'
⍋p → 5.81E¯3 |    0% ⎕⎕⎕
* ⍋q → 5.71E¯2 | +882% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
``````
• We saw previously that if `p` is a permutation then `⍋p ←→ p⍳⍳⍴p`. The special code for `⍋p` makes the two expressions run at roughly the same speed. The slight advantage for `⍋⍋x` versus `(⍋x)⍳⍳⍴x` would increase if and when `⍋⍋` is recognized as an idiom.
``````      cmpx '⍋p' 'p⍳⍳⍴p'
⍋p    → 6.02E¯3 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
p⍳⍳⍴p → 6.57E¯3 | +9% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

cmpx '⍋⍋x' '(⍋x)⍳⍳⍴x'
⍋⍋x      → 3.16E¯2 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
(⍋x)⍳⍳⍴x → 3.25E¯2 | +2% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

``````

## A General Solution

Index-of works on cells rather than just scalars. Likewise, progressive index-of can also be extended to work on cells. The core algorithm remains the same. The generalization obtains by first reshaping `⍵` to have the same rank as `⍺` (having major cells with the same shape), applying the core algorithm, and then reshaping its result to have the same leading shape as the original `⍵`. Thus:

``````pixd←{
m←≢⍺
r←0⌊1-⍴⍴⍺
n←×/r↓⍴⍵
i←⍺⍳⍺⍪(n,1↓⍴⍺)⍴⍵
(r↓⍴⍵) ⍴ ((⍋i)⍳⍳m) ⍳ ((⍋m⌽i)⍳⍳n)
}

xx              yy
mmmm            dddd
iiii            iiii
ssss            ssss
ssss            mmmm
iiii            iiii
ssss            ssss
ssss            ssss
iiii
pppp
pppp                                  x
iiii                               mississippi

⍴xx             ⍴yy                y
11 4            7 4                dismiss

xx pixd yy                         x pixd y
11 1 2 0 4 3 5                     11 1 2 0 4 3 5

xx pixd 3 5 4⍴yy                   x pixd 3 5⍴y
11  1  2  0  4                     11  1  2  0  4
3  5 11  7  6                      3  5 11  7  6
11 10 11 11 11                     11 10 11 11 11
``````

Postscript
After having written the above, I discovered an alternative exposition on progressive index-of by Bob Smith entitled Anatomy of an Idiom. Adám Brudzewsky has produced a Stack Exchange lesson and a Jupyter Notebook based on Smith’s text.

There is also an exposition in J on the same topic, with a more verbose but easier-to-understand derivation.