In `⍺ f⌸ ⍵`

, major cells of `⍺`

specify keys for the corresponding major cells of `⍵`

, and `f`

applies to each unique major cell of `⍺`

and the major cells of `⍵`

having that key. The monadic case `f⌸⍵`

is equivalent to `⍵ f⌸ ⍳≢⍵`

. *Key* is similar to the GROUP BY clause in SQL.

For example:

```
⎕io←0
(⍳11) ,[¯0.5] 'Mississippi'
0 1 2 3 4 5 6 7 8 9 10
M i s s i s s i p p i
{⍺}⌸'Mississippi' {⍺⍵}⌸'Mississippi'
Misp ┌─┬────────┐
│M│0 │
{⊂⍵}⌸'Mississippi' ├─┼────────┤
┌─┬────────┬───────┬───┐ │i│1 4 7 10│
│0│1 4 7 10│2 3 5 6│8 9│ ├─┼────────┤
└─┴────────┴───────┴───┘ │s│2 3 5 6 │
├─┼────────┤
{≢⍵}⌸'Mississippi' │p│8 9 │
1 4 4 2 └─┴────────┘
```

We illustrate *key* by using it to model some APL primitives and to motivate and solve a programming puzzle.

## Index-Of and Friends

Since *key* is defined in terms of *unique* and indices, we expect to be able to effect members of the *index-of* family. And so we can.

The specifications say that `f`

applies to each unique major cell of `⍺`

. Therefore, `{⍺}⌸`

and equivalently `⊣⌸`

computes *unique*. Since the specifications are for unique *major cells*, these are what `∪⍵`

would be if it were extended to non-vector arguments, analagous to the way `⍳`

was extended to non-vector left arguments.

```
x←↑'Aspen' 'John' 'Susan' 'Roger' 'Opal' 'John' 'Aspen'
x
Aspen
John
Susan
Roger
Opal
John
Aspen
{⍺}⌸x ⊣⌸x
Aspen Aspen
John John
Susan Susan
Roger Roger
Opal Opal
```

A little surprisingly, *key* can also be used to model extended *index-of* and extended *membership*. For reasons which will become clear, we model `∍`

(AKA `∊⍨`

) instead of `∊`

. Both models use the *merge* operator, whimsically denoted here by `→`

.

```
ix ← {(≢⍺) ↓ (0⍴⍨(≢⍺)+≢⍵) ((∊i)→)⍨ (≢¨i)/(≢⍺)⌊⊃¨i←{⊂⍵}⌸⍺⍪⍵}
has ← {(≢⍺) ↓ (0⍴⍨(≢⍺)+≢⍵) ((∊i)→)⍨ (≢¨i)/(≢⍺)>⊃¨i←{⊂⍵}⌸⍺⍪⍵}
y←↑'China' 'Susan' 'John' 'John' 'Anne' 'Roger'
y
China
Susan
John
John
Anne
Roger
x ix y
7 2 1 1 7 3
x has y
0 1 1 1 0 1
```

To focus on essentials, `ix`

and `has`

do not directly handle the case where the right argument has higher rank than the left argument (the principal argument). Such higher rank arguments can be handled by pre-application of `{↑,⊂⍤(¯1+⍴⍴⍺)⊢⍵}`

and post-application of `((1-⍴⍴⍺)↓⍴⍵)⍴`

.

## Partition

Since *key* can effect non-contiguous partitions, we expect to be able to use it to effect contiguous partitions, a special case of the former. And so we can:

```
part ← {(+\b/⍺){⊂⍵}⌸b⌿⍵⊣b←∨\⍺}
x
Aspen
John
Susan
Roger
Opal
John
Aspen
0 1 0 1 0 0 1 part x
┌─────┬─────┬─────┐
│John │Roger│Aspen│
│Susan│Opal │ │
│ │John │ │
└─────┴─────┴─────┘
0 1 0 1 0 0 1 (⊂[0] ≡ part) x
1
(7⍴0) (⊂[0] ≡ part) x
1
```

## Sort and Grade

```
sort ← {⍺⌿⍨¯1+{≢⍵}⌸⍺⍪⍵}
grade ← {(≢⍺)-⍨∊1↓¨{⊂⍵}⌸⍺⍪⍵}
```

If `⍵`

is an array from a small “alphabet” `⍺`

, then `⍺ sort ⍵`

sorts `⍵`

and `⍺ grade ⍵`

grades it. For example:

```
x←?1e6⍴256 a←⎕a[?1e6⍴26]
(⍋x) ≡ (⍳256) grade x (⍋a) ≡ ⎕a grade a
1 1
x[⍋x] ≡ (⍳256) sort x a[⍋a] ≡ ⎕a sort a
1 1
```

## The Maximum of a Vector

In Dyalog ’13 session D11, I included a slide giving relative timings on the then new *key* operator:

```
x←?1e6⍴1000
a. ¯1+{≢⍵}⌸(⍳1000),x 1.00
b. ¯1+{≢⍵}⌸(⍳1+⌈/x),x 1.62
c. ...
```

The timings show that *key* with *tally* takes less than a factor of 2 over finding the maximum. This is fast; unbelievably fast even. The following puzzle arose from investigations into this timing: Find the maximum of a vector of unsigned 1-byte integers without using multicore, vector instructions, loop unrolling, etc. Can you do it faster in C than the following?

```
max=*x++;
for(i=1;i<n;++i){if(max<*x)max=*x; ++x;}
```

I posed the puzzle to some expert C programmers, and they failed to solve it. An “obvious” solution obtains by considering an obvious implementation of `{≢⍵}⌸x`

in C:

M=256; memset(c,0,M*4); // initialize table of counts for(i=0;i<n;++i)c[*x++]++; // analoguous toc[x]+←1in APL

Therefore, one way to find the maximum of x is to have an M-element (byte) Boolean table b, and:

memset(b,0,M); for(i=0;i<n;++i)b[*x++]=1; // analoguous tob[x]←1in APL c=b+M; // scan from the back for(i=0;i<M;++i)if(*--c){max=c-b; break;}

Timing show that the table method is faster than the comparison method by a factor of 1.5 for 1-byte integers and 1.3 for 2-byte integers. The table method requires more code, but the main `n`

-loop, the time for which dominates the overall time, is simpler for the table method.

Jay Foad countered that finding the maximum (and the minimum) is much faster still with the vector instructions available in modern CPUs. Dyalog has now (version 14.1) implemented such finding, with performance benefits for *key*, the index-of family, grade, and sort.