Stencil Maneuvers

Introduction

The e-mail arrived in the early afternoon from Morten, in Finland attending the FinnAPL Forest Seminar.

How do I speed this up and impress the Finns?

0 cmpx 'e←⊃∨/0.2 edges¨r g b'
6.4E¯1
edges
{⍺←0.7 ⋄ 1 1↓¯1 ¯1↓⍺<(|EdgeDetect apply ⍵)÷1⌈(+⌿÷≢),⍵}
apply
{stencil←⍺ ⋄ {+/,⍵×stencil}⌺(⍴stencil)⊢⍵}
EdgeDetect
¯1 ¯1 ¯1
¯1 8 ¯1
¯1 ¯1 ¯1

(r g b in the attached ws)

Background

is stencil, a new dyadic operator which will be available in version 16.0. In brief, f⌺s applies the operand function f to windows of size s. The window is moved over the argument centered over every possible position. The size is commonly odd and the movement commonly 1. For example:

   {⊂⍵}⌺5 3⊢6 5⍴⍳30
┌───────┬────────┬────────┬────────┬───────┐
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
│0  1  2│ 1  2  3│ 2  3  4│ 3  4  5│ 4  5 0│
│0  6  7│ 6  7  8│ 7  8  9│ 8  9 10│ 9 10 0│
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
├───────┼────────┼────────┼────────┼───────┤
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
│0  1  2│ 1  2  3│ 2  3  4│ 3  4  5│ 4  5 0│
│0  6  7│ 6  7  8│ 7  8  9│ 8  9 10│ 9 10 0│
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
├───────┼────────┼────────┼────────┼───────┤
│0  1  2│ 1  2  3│ 2  3  4│ 3  4  5│ 4  5 0│
│0  6  7│ 6  7  8│ 7  8  9│ 8  9 10│ 9 10 0│
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
│0 21 22│21 22 23│22 23 24│23 24 25│24 25 0│
├───────┼────────┼────────┼────────┼───────┤
│0  6  7│ 6  7  8│ 7  8  9│ 8  9 10│ 9 10 0│
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
│0 21 22│21 22 23│22 23 24│23 24 25│24 25 0│
│0 26 27│26 27 28│27 28 29│28 29 30│29 30 0│
├───────┼────────┼────────┼────────┼───────┤
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
│0 21 22│21 22 23│22 23 24│23 24 25│24 25 0│
│0 26 27│26 27 28│27 28 29│28 29 30│29 30 0│
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
├───────┼────────┼────────┼────────┼───────┤
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
│0 21 22│21 22 23│22 23 24│23 24 25│24 25 0│
│0 26 27│26 27 28│27 28 29│28 29 30│29 30 0│
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
└───────┴────────┴────────┴────────┴───────┘
   {+/,⍵}⌺5 3⊢6 5⍴⍳30
 39  63  72  81  57
 72 114 126 138  96
115 180 195 210 145
165 255 270 285 195
152 234 246 258 176
129 198 207 216 147

In addition, for matrix right arguments with movement 1, special code is provided for the following operand functions:

{∧/,⍵}   {∨/,⍵}   {=/,⍵}   {≠/,⍵} 

{⍵}      {,⍵}     {⊂⍵}     {+/,⍵}

{+/,A×⍵}     {E<+/,A×⍵}       
{+/⍪A×⍤2⊢⍵}  {E<+/⍪A×⍤2⊢⍵}

The comparison < can be one of < ≤ = ≠ ≥ >.

The variables r g b in the problem are integer matrices with shape 227 316 having values between 0 and 255. For present purposes we can initialize them to random values:

   r←¯1+?227 316⍴256
   g←¯1+?227 316⍴256
   b←¯1+?227 316⍴256

Opening Moves

I had other obligations and did not attend to the problem until 7 PM. A low-hanging fruit was immediately apparent: {+/,⍵×A}⌺s is not implemented by special code but {+/,A×⍵}⌺s is. The difference in performance is significant:

   edges1←{⍺←0.7 ⋄ 1 1↓¯1 ¯1↓⍺<(|EdgeDetect apply1 ⍵)÷1⌈(+⌿÷≢),⍵}
   apply1←{stencil←⍺ ⋄ {+/,stencil×⍵}⌺(⍴stencil)⊢⍵}

   cmpx 'e←⊃∨/0.2 edges1¨r g b' 'e←⊃∨/0.2 edges ¨r g b'
  e←⊃∨/0.2 edges1¨r g b → 8.0E¯3 |     0% ⎕
  e←⊃∨/0.2 edges ¨r g b → 6.1E¯1 | +7559% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

I fired off an e-mail reporting this good result, then turned to other more urgent matters. An example of the good being an enemy of the better, I suppose.

When I returned to the problem at 11 PM, the smug good feelings have largely dissipated:

  • Why should {+/,A×⍵}⌺s be so much faster than {+/,⍵×A}⌺s? That is, why is there special code for the former but not the latter? (Answer: all the special codes end with … ⍵}.)
  • I can not win: If the factor is small, then why isn’t it larger; if it is large, it’s only because the original code wasn’t very good.
  • The large factor is because, for this problem, C (special code) is still so much faster than APL (magic function).
  • If the absolute value can somehow be replaced, the operand function can then be in the form {E<+/,A×⍵}⌺s, already implemented by special code. Alternatively, {E<|+/,A×⍵}⌺s can be implemented by new special code.

Regarding the last point, the performance improvement potentially can be:

   edges2←{⍺←0.7 ⋄ 1 1↓¯1 ¯1↓(⍺ EdgeDetect)apply2 ⍵}
   apply2←{threshold stencil←⍺ ⋄ 
                    {threshold<+/,stencil×⍵}⌺(⍴stencil)⊢⍵}

   cmpx (⊂'e←⊃∨/0.2 edges'),¨'21 ',¨⊂'¨r g b'
  e←⊃∨/0.2 edges2¨r g b → 4.8E¯3 |      0%
* e←⊃∨/0.2 edges1¨r g b → 7.5E¯3 |    +57%
* e←⊃∨/0.2 edges ¨r g b → 7.4E¯1 | +15489% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The * in the result of cmpx indicates that the expressions don’t give the same results. That is expected; edges2 is not meant to give the same results but is to get a sense of the performance difference. (Here, an additional factor of 1.57.)

edges1 has the expression ⍺<matrix÷1⌈(+⌿÷≢),⍵ where and the divisor are both scalars. Reordering the expression eliminates one matrix operation: (⍺×1⌈(+⌿÷≢),⍵)<matrix. Thus:

   edges1a←{⍺←0.7 ⋄ 1 1↓¯1 ¯1↓(⍺×1⌈(+⌿÷≢),⍵)<|EdgeDetect apply1 ⍵}

   cmpx (⊂'e←⊃∨/0.2 edges'),¨('1a' '1 ' '  '),¨⊂'¨r g b'
  e←⊃∨/0.2 edges1a¨r g b → 6.3E¯3 |      0%
  e←⊃∨/0.2 edges1 ¨r g b → 7.6E¯3 |    +22%
  e←⊃∨/0.2 edges  ¨r g b → 6.5E¯1 | +10232% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The time was almost 1 AM. To bed.

Further Maneuvers

Fresh and promising ideas came with the morning. The following discussion applies to the operand function {+/,A×⍵}.

(0) Scalar multiple: If all the elements of A are equal, then {+/,A×⍵}⌺(⍴A)⊢r ←→ (⊃A)×{+/,⍵}⌺(⍴A)⊢r.

   A←3 3⍴?17
   ({+/,A×⍵}⌺(⍴A)⊢r) ≡ (⊃A)×{+/,⍵}⌺(⍴A)⊢r
1

(1) Sum v inner product: {+/,⍵}⌺(⍴A)⊢r is significantly faster than {+/,A×⍵}⌺(⍴A)⊢r because the former exploits mathematical properties absent from the latter.

   A←?3 3⍴17
   cmpx '{+/,⍵}⌺(⍴A)⊢r' '{+/,A×⍵}⌺(⍴A)⊢r'
  {+/,⍵}⌺(⍴A)⊢r   → 1.4E¯4 |    0% ⎕⎕⎕
* {+/,A×⍵}⌺(⍴A)⊢r → 1.3E¯3 | +828% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

(The * in the result of cmpx is expected.)

(2) Linearity: the stencil of the sum equals the sum of the stencils.

   A←?3 3⍴17
   B←?3 3⍴17
   ({+/,(A+B)×⍵}⌺(⍴A)⊢r) ≡ ({+/,A×⍵}⌺(⍴A)⊢r) + {+/,B×⍵}⌺(⍴A)⊢r 
1

(3) Middle: If B is zero everywhere except the middle, then {+/,B×⍵}⌺(⍴B)⊢r ←→ mid×r where mid is the middle value.

   B←(⍴A)⍴0 0 0 0 9
   B
0 0 0
0 9 0
0 0 0
   ({+/,B×⍵}⌺(⍴B)⊢r) ≡ 9×r
1

(4) A faster solution.

   A←EdgeDetect
   B←(⍴A)⍴0 0 0 0 9
   C←(⍴A)⍴¯1
   A B C
┌────────┬─────┬────────┐
│¯1 ¯1 ¯1│0 0 0│¯1 ¯1 ¯1│
│¯1  8 ¯1│0 9 0│¯1 ¯1 ¯1│
│¯1 ¯1 ¯1│0 0 0│¯1 ¯1 ¯1│
└────────┴─────┴────────┘
   A ≡ B+C
1

Whence:

   ({+/,A×⍵}⌺(⍴A)⊢r) ≡ ({+/,B×⍵}⌺(⍴A)⊢r) + {+/,C×⍵}⌺(⍴A)⊢r ⍝ (2)
1
   ({+/,A×⍵}⌺(⍴A)⊢r) ≡ ({+/,B×⍵}⌺(⍴A)⊢r) - {+/,⍵}⌺(⍴A)⊢r   ⍝ (0)
1
   ({+/,A×⍵}⌺(⍴A)⊢r) ≡ (9×r) - {+/,⍵}⌺(⍴A)⊢r                ⍝ (3)
1

Putting it all together:

edges3←{
  ⍺←0.7
  mid←⊃EdgeDetect↓⍨⌊2÷⍨⍴EdgeDetect
  1 1↓¯1 ¯1↓(⍺×1⌈(+⌿÷≢),⍵)<|(⍵×1+mid)-{+/,⍵}⌺(⍴EdgeDetect)⊢⍵
}

Comparing the various edges:

   x←(⊂'e←⊃∨/0.2 edges'),¨('3 ' '2 ' '1a' '1 ' '  '),¨⊂'¨r g b'
   cmpx x
  e←⊃∨/0.2 edges3 ¨r g b → 3.4E¯3 |      0%
* e←⊃∨/0.2 edges2 ¨r g b → 4.3E¯3 |    +25%
  e←⊃∨/0.2 edges1a¨r g b → 6.4E¯3 |    +88%
  e←⊃∨/0.2 edges1 ¨r g b → 7.5E¯3 |   +122%
  e←⊃∨/0.2 edges  ¨r g b → 6.5E¯1 | +19022% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
edges original
edges1 uses {+/,stencil×⍵} instead of {+/,⍵×stencil}
edges1a uses (⍺×mean,⍵)<matrix instead of ⍺<matrix÷mean,⍵
edges2 uses {threshold<+/,stencil×⍵}
edges3 uses the faster {+/,⍵} instead of {+/,stencil×⍵} and other maneuvers

Fin

I don’t know if the Finns would be impressed. The exercise has been amusing in any case.

Calculation v Look-Up

(⎕io←0 and timings are done in Dyalog version 15.0.)

Table Look-Up

Some functions can be computed faster by table look-up than a more traditional and more conventional calculation. For example:

      b←?1e6⍴2

      cmpx '*b' '(*0 1)[b]'
 *b        → 1.32E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 (*0 1)[b] → 1.23E¯3 | -91% ⎕⎕⎕

Some observations about this benchmark:

(a) The advantage of table look-up depends on the time to calculate the function directly, versus the time to do indexing, ⍺[⍵] or ⍵⌷⍺, plus a small amount of time to calculate the function on a (much) smaller representative set.

Indexing is particularly efficient when the indices are boolean:

      bb←b,1
      bi←b,2
      cmpx '2 3[bb]' '2 3 3[bi]'
 2 3[bb]   → 1.12E¯4 |    0% ⎕⎕⎕⎕⎕
 2 3 3[bi] → 7.39E¯4 | +558% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

bi is the same as bb except that the last element of 2 forces it to be 1-byte integers (for bi) instead of 1-bit booleans (for bb).

(b) Although the faster expression works best with 0-origin, the timing for 1-origin is similar.

      cmpx '*b' '{⎕io←0 ⋄ (*0 1)[⍵]}b'
 *b                   → 1.32E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 {⎕io←0 ⋄ (*0 1)[⍵]}b → 1.22E¯3 | -91% ⎕⎕⎕

That is, if the current value of ⎕io is 1, the implementation can use a local ⎕io setting of 0.

(c) The fact that *b is currently slower than (*0 1)[b] represents a judgment by the implementers that *b is not a sufficiently common computation to warrant writing faster code for it. Other similar functions already embody the table look-up technique:

      cmpx '⍕b' '¯1↓,(2 2⍴''0 1 '')[b;]'
 ⍕b                   → 6.95E¯4 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 ¯1↓,(2 2⍴'0 1 ')[b;] → 6.92E¯4 | -1% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Small Domains

Table look-up works best for booleans, but it also works for other small domains such as 1-byte integers:

      i1←?1e6⍴128    ⍝ 1-byte integers

      cmpx '*i1' '(*⍳128)[i1]'
 *i1         → 1.48E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 (*⍳128)[i1] → 9.30E¯4 | -94% ⎕⎕

      cmpx '○i1' '(○⍳128)[i1]'
 ○i1         → 2.78E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 (○⍳128)[i1] → 9.25E¯4 | -67% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

In general, a table for 1-byte integers needs to have values for ¯128+⍳256. Here ⍳128 is used to simulate that in C indexing can be done on 1-byte indices without extra computation on the indices. In APL, general 1-byte indices are used as table[i1+128]; in C, the expression is (table+offset)[i].

Domains considered “small” get bigger all the time as machines come equipped with ever larger memories.

Larger Domains

Table look-up is applicable when the argument is not a substantial subset of a large domain and there is a fast way to detect that it is not.

      i4s←1e7+?1e6⍴1e5    ⍝ small-range 4-byte integers

      G←{(⊂u⍳⍵)⌷⍺⍺ u←∪⍵}

      cmpx '⍟i4s' '⍟G i4s'
 ⍟i4s   → 1.44E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 ⍟G i4s → 9.59E¯3 | -34% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The definition of G implements that the argument is not used directly as indices (as in (⊂⍵)⌷⍺⍺ u) but needed to be looked up, the u⍳⍵ in (⊂u⍳⍵)⌷⍺⍺ u. Thus both index-of and indexing are key enablers for making table look-up competitive.

§4.3 of Notation as a Tool of Thought presents a different way to distribute the results of a calculation on the “nub” (unique items). But it takes more time and space and is limited to numeric scalar results.

      H←{(⍺⍺ u)+.×(u←∪⍵)∘.=⍵}

      i4a←1e7+?1e4⍴1e3    ⍝ small-range 4-byte integers

      cmpx '⍟i4a' '⍟G i4a' '⍟H i4a'
 ⍟i4a   → 1.56E¯4 |       0%
 ⍟G i4a → 6.25E¯5 |     -60%
 ⍟H i4a → 1.78E¯1 | +113500% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The benchmark is done on the smaller i4a as a benchmark on i4s founders on the ≢∪⍵ by ≢⍵ boolean matrix (for i4s, 12.5 GB = ×/ 0.125 1e5 1e6) created by H.

Mathematical Background

New Math” teaches that a function is a set of ordered pairs whose first items are unique:

*⍵: {(⍵,*⍵)|⍵∊C}      The set of all (⍵,*⍵) where is a complex number
⍟⍵: {(⍵,⍟⍵)|⍵∊C~{0}}  The set of all (⍵,⍟⍵) where is a non-zero complex number
○⍵: {(⍵,○⍵)|⍵∊C}      The set of all (⍵,○⍵) where is a complex number
⍕⍵: {(⍵,⍕⍵)|0=⍴⍴⍵}    The set of all (⍵,⍕⍵) where is a scalar

When is restricted (for example) to the boolean domain, the functions, the sets of ordered pairs, are more simply represented by enumerating all the possibilities:

*⍵: {(0,1), (1,2.71828)}
⍟⍵: {(0,Err), (1,0)}
○⍵: {(0,0), (1,3.14159)}
⍕⍵: {(0,'0'), (1,'1')}

Realized and Potential Performance Improvements

realized (available no later than 15.0)

boolean
1-byte integer
⍕ ∘.f
⍕     a∘⍕

potential (non-exhaustive list)

boolean
1-byte integer
2-byte integer
small-range 4-byte integer
* ⍟ | - ○   ! ⍳ a∘f f.g
* ⍟ | - ○ × ! ⍳ a∘f f.g ∘.f
* ⍟ | - ○ ×   ⍳ a∘f
* ⍟ |     ×

a is a scalar; f and g are primitive scalar dyadic functions.

APL Exercises

These exercises are designed to introduce APL to a high school senior adept in mathematics. The entire set of APL Exercises has the following sections:

Introduction
0. Beginnings
1. Utilities
2. Recursion
3. Proofs
4. Rank Operator
5. Index-Of
6. Key Operator
7. Indexing
8. Grade and Sort
9. Power Operator
10. Arithmetic
11. Combinatorial Objects
The Fine Print

Sections 2 and 10 are the most interesting ones and are as follows. The full set is too lengthy (and too boring?) to be included here. Some students may be able to make up their own exercises from the list of topics. As well, it may be useful to know what the “experts” think are important aspects of APL.

2. Recursion

⎕io←0 throughout.

2.0 Factorial

Write a function to compute the factorial function on integer , ⍵≥0.

      fac¨ 5 6 7 8
120 720 5040 40320

      n←1+?20
      (fac n) = n×fac n-1
1
      fac 0
1

2.1 Fibonacci Numbers

Write a function to compute the -th Fibonacci number, ⍵≥0.

      fib¨ 5 6 7 8
5 8 13 21    

      n←2+?20
      (fib n) = (fib n-2) + (fib n-1)
1
      fib 0
0
      fib 1
1

If the function written above is multiply recursive, write a version which is singly recursive (only one occurrence of ). Use cmpx to compare the performance of the two functions on the argument 35.

2.2 Peano Addition

Let and be natural numbers (non-negative integers). Write a function padd to compute ⍺+⍵ without using + itself (or - or × or ÷ …). The only functions allowed are comparison to 0 and the functions pre←{⍵-1} and suc←{⍵+1}.

2.3 Peano Multiplication

Let and be natural numbers (non-negative integers). Write a function ptimes to compute ⍺×⍵ without using × itself (or + or - or ÷ …). The only functions allowed are comparison to 0 and the functions pre←{⍵-1} and suc←{⍵+1} and padd.

2.4 Binomial Coefficients

Write a function to produce the binomal coefficients of order , ⍵≥0.

      bc 0
1
      bc 1
1 1
      bc 2
1 2 1
      bc 3
1 3 3 1
      bc 4
1 4 6 4 1

2.5 Cantor Set

Write a function to compute the Cantor set of order , ⍵≥0.

      Cantor 0
1
      Cantor 1
1 0 1
      Cantor 2
1 0 1 0 0 0 1 0 1
      Cantor 3
1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1

The classical version of the Cantor set starts with the interval [0,1] and at each stage removes the middle third from each remaining subinterval:

[0,1] →
[0,1/3] ∪ [2/3,1] →
[0,1/9] ∪ [2/9,1/3] ∪ [2/3,7/9] ∪ [8/9,1] → ….

      {(+⌿÷≢)Cantor ⍵}¨ 3 5⍴⍳15
1         0.666667  0.444444   0.296296   0.197531  
0.131687  0.0877915 0.0585277  0.0390184  0.0260123 
0.0173415 0.011561  0.00770735 0.00513823 0.00342549

      (2÷3)*3 5⍴⍳15
1         0.666667  0.444444   0.296296   0.197531  
0.131687  0.0877915 0.0585277  0.0390184  0.0260123 
0.0173415 0.011561  0.00770735 0.00513823 0.00342549

2.6 Sierpinski Carpet

Write a function to compute the Sierpinski Carpet of order , ⍵≥0.

      SC 0
1
      SC 1
1 1 1
1 0 1
1 1 1
      SC 2
1 1 1 1 1 1 1 1 1
1 0 1 1 0 1 1 0 1
1 1 1 1 1 1 1 1 1
1 1 1 0 0 0 1 1 1
1 0 1 0 0 0 1 0 1
1 1 1 0 0 0 1 1 1
1 1 1 1 1 1 1 1 1
1 0 1 1 0 1 1 0 1
1 1 1 1 1 1 1 1 1

      ' ⌹'[SC 3]
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹
⌹ ⌹   ⌹ ⌹⌹ ⌹   ⌹ ⌹⌹ ⌹   ⌹ ⌹
⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹         ⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹ ⌹⌹ ⌹⌹ ⌹         ⌹ ⌹⌹ ⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹         ⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹   ⌹⌹⌹         ⌹⌹⌹   ⌹⌹⌹
⌹ ⌹   ⌹ ⌹         ⌹ ⌹   ⌹ ⌹
⌹⌹⌹   ⌹⌹⌹         ⌹⌹⌹   ⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹         ⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹ ⌹⌹ ⌹⌹ ⌹         ⌹ ⌹⌹ ⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹         ⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹
⌹ ⌹   ⌹ ⌹⌹ ⌹   ⌹ ⌹⌹ ⌹   ⌹ ⌹
⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹

3-D analogs of the Cantor set and the Sierpinski carpet are the Sierpinski sponge or the Menger sponge.

2.7 Extended H

Write a function for the extend H algorithm for , ⍵≥0, which embeds the complete binary tree of depth on the plane. In ×H ⍵ there are 2*⍵ leaves, instances of the letter o with exactly one neighbor (or no neighbors for 0=⍵). The root is at the center of the matrix.

      xH¨ ⍳5
┌─┬─────┬─────┬─────────────┬─────────────┐
│o│o-o-o│o   o│o-o-o   o-o-o│o   o   o   o│
│ │     │|   |│  |       |  │|   |   |   |│
│ │     │o-o-o│  o---o---o  │o-o-o   o-o-o│
│ │     │|   |│  |       |  │| | |   | | |│
│ │     │o   o│o-o-o   o-o-o│o | o   o | o│
│ │     │     │             │  |       |  │
│ │     │     │             │  o---o---o  │
│ │     │     │             │  |       |  │
│ │     │     │             │o | o   o | o│
│ │     │     │             │| | |   | | |│
│ │     │     │             │o-o-o   o-o-o│
│ │     │     │             │|   |   |   |│
│ │     │     │             │o   o   o   o│
└─┴─────┴─────┴─────────────┴─────────────┘

Write a function that has the same result as xH but checks the result using the assert utility. For example:

assert←{⍺←'assertion failure' ⋄ 0∊⍵:⍺ ⎕SIGNAL 8 ⋄ shy←0}

xH1←{
  h←xH ⍵
  assert 2=⍴⍴h:
  assert h∊'○ -|':
  assert (¯1+2*1+⍵)=+/'o'=,h:
  ...
  h
}

2.8 Tower of Hanoi

(By André Karwath aka Aka (Own work) [CC BY-SA 2.5 (http://creativecommons.org/licenses/by-sa/2.5)], via Wikimedia Commons)

The Tower of Hanoi problem is to move a set of different-sized disks from one peg to another, moving one disk at a time, using an intermediate peg if necessary. At all times no larger disk may sit on top of a smaller disk.

Write a function Hanoi to solve the problem of moving disks from peg 0 to peg 2. Since it’s always the disk which is at the top of a peg which is moved, the solution can be stated as a 2-column matrix with column 0 indicating the source peg and column 1 the destination peg.

      Hanoi¨ ⍳5
┌───┬───┬───┬───┬───┐
│   │0 2│0 1│0 2│0 1│
│   │   │0 2│0 1│0 2│
│   │   │1 2│2 1│1 2│
│   │   │   │0 2│0 1│
│   │   │   │1 0│2 0│
│   │   │   │1 2│2 1│
│   │   │   │0 2│0 1│
│   │   │   │   │0 2│
│   │   │   │   │1 2│
│   │   │   │   │1 0│
│   │   │   │   │2 0│
│   │   │   │   │1 2│
│   │   │   │   │0 1│
│   │   │   │   │0 2│
│   │   │   │   │1 2│
└───┴───┴───┴───┴───┘

Prove that Hanoi ⍵ has ¯1+2*⍵ rows.

10. Arithmetic

⎕io←0 throughout.

10.0 Primality Testing

Write a function prime ⍵ which is 1 or 0 depending on whether non-negative integer is a prime number. For example:

      prime 1
0
      prime 2
1
      prime 9
0
      prime¨ 10 10⍴⍳100
0 0 1 1 0 1 0 1 0 0
0 1 0 1 0 0 0 1 0 1
0 0 0 1 0 0 0 0 0 1
0 1 0 0 0 0 0 1 0 0
0 1 0 1 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 1
0 1 0 0 0 0 0 1 0 0
0 1 0 1 0 0 0 0 0 1
0 0 0 1 0 0 0 0 0 1
0 0 0 0 0 0 0 1 0 0

10.1 Primes Less Than n

Write a function primes ⍵ using the sieve method which produces all the primes less than . Use cmpx to compare the speed of primes n and (prime¨⍳n)/⍳n. For example:

      primes 7
2 3 5
      primes 50
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47

      (prime¨⍳50)/⍳50
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47

10.2 Factoring

Write a function factor which produces the prime factorization of , such that ⍵ = ×/factor ⍵ and ∧/ prime¨ factor ⍵ is 1. For example:

      factor 144
2 2 2 2 3 3
      ×/ factor 144
144
      ∧/ prime¨ factor 144
1

      factor 1

      factor 2
2

10.3 pco

Read about the pco function in http://dfns.dyalog.com/n_pco.htm and experiment with it.

      )copy dfns pco

      pco ⍳7
2 3 5 7 11 13 17

      1 pco ⍳10
0 0 1 1 0 1 0 1 0 0

G.H. Hardy states in A Mathematician’s Apology (chapter 14, page 23) that the number of primes less than 1e9 is 50847478. Use pco to check whether this is correct.

10.4 GCD

Write a function ⍺ gcd ⍵ to compute the greatest common divisor of non-negative integers and using the Euclidean algorithm.

Write a function ⍺ egcd ⍵ to compute the GCD of and as coefficients of and , such that (⍺ gcd ⍵) = (⍺,⍵)+.×⍺ egcd ⍵.

      112 gcd 144
16
      112 egcd 144
4 ¯3
      112 144 +.× 112 egcd 144
16

10.5 Ring Extension

Z[√k] is the ring of integers Z extended with √k where k is not a perfect square. The elements of Z[√k] are ordered pairs (a,b) which can be interpreted as a+b×√k (a+b×k*0.5).

Write a d-operator ⍺(⍺⍺ rplus)⍵ which does addition in Z[√⍺⍺].

      3 4 (5 rplus) ¯2 7
1 11

Write a d-operator ⍺(⍺⍺ rtimes)⍵ which does multiplication in Z[√⍺⍺].

      3 4 (5 rtimes) ¯2 7
134 13

Both of the above can be done using integer operations only.

10.6 Repeated Squaring I

Write a function ⍺ pow ⍵ that computes ⍺*⍵ using repeated squaring. is a number and a non-negative integer. Hint: the binary representation of an integer n obtains as 2⊥⍣¯1⊢n.

10.7 Repeated Squaring II

Write a d-operator ⍺(⍺⍺ rpow)⍵ that computes raised to the power using repeated squaring. is in Z[√⍺⍺] and is a non-negative integer.

      ⊃ (5 rtimes)/ 13 ⍴ ⊂ 1 1
2134016 954368

      1 1 (5 rpow) 13
2134016 954368
      1 ¯1 (5 rpow) 13
2134016 ¯954368

The last two expressions are key steps in an O(⍟n) computation of the n-th Fibonacci number using integer operations, using Binet’s formula:

50847534

⎕io←0 throughout.

I was re-reading A Mathematician’s Apology before recommending it to Suki Tekverk, our summer intern, and came across a statement that the number of primes less than 1e9 is 50847478 (§14, page 23). The function pco from the dfns workspace does computations on primes; ¯1 pco n is the number of primes less than n:

      )copy dfns pco

      ¯1 pco 1e9
50847534

The two numbers 50847478 and 50847534 cannot both be correct. A search of 50847478 on the internet reveals that it is incorrect and that 50847534 is correct. 50847478 is erroneously cited in various textbooks and even has a name, Bertelsen’s number, memorably described by MathWorld as “an erroneous name erroneously given the erroneous value of π(1e9) = 50847478.”

Although several internet sources give 50847534 as the number of primes less than 1e9, they don’t provide a proof. Relying on them would be doing the same thing — arguing from authority — that led other authors astray, even if it is the authority of the mighty internet. Besides, I’d already given Suki, an aspiring mathematician, the standard spiel about the importance of proof.

How do you prove the 50847534 number? One way is to prove correct a program that produces it. Proving pco correct seems daunting — it has 103 lines and two large tables. Therefore, I wrote a function from scratch to compute the number of primes less than , a function written in a way that facilitates proof.

sieve←{
  b←⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5
  b[⍳6⌊⍵]←(6⌊⍵)⍴0 0 1 1 0 1
  49≥⍵:b
  p←3↓{⍵/⍳⍴⍵}∇⌈⍵*0.5
  m←1+⌊(⍵-1+p×p)÷2×p
  b ⊣ p {b[⍺×⍺+2×⍳⍵]←0}¨ m
}

      +/ sieve 1e9
50847534

sieve ⍵ produces a boolean vector b with length such that b/⍳⍴b are all the primes less than . It implements the sieve of Eratosthenes: mark as not-prime multiples of each prime less than ⍵*0.5; this latter list of primes obtains by applying the function recursively to ⌈⍵*0.5. Some obvious optimizations are implemented:

  • Multiples of 2 3 5 are marked by initializing b with ⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5 rather than with ⍵⍴1.
  • Subsequently, only odd multiples of primes > 5 are marked.
  • Multiples of a prime to be marked start at its square.

Further examples:

      +/∘sieve¨ ⍳10
0 0 0 1 2 2 3 3 4 4

      +/∘sieve¨ 10*⍳10
0 4 25 168 1229 9592 78498 664579 5761455 50847534

There are other functions which are much easier to prove correct, for example, sieve1←{2=+⌿0=(⍳3⌈⍵)∘.|⍳⍵}. However, sieve1 requires O(⍵*2) space and sieve1 1e9 cannot produce a result with current technology. (Hmm, a provably correct program that cannot produce the desired result …)

We can test that sieve is consistent with pco and that pco is self-consistent. pco is a model of the p: primitive function in J. Its cases are:
   pco n   the n-th prime
¯1 pco n   the number of primes less than n
 1 pco n   1 iff n is prime
 2 pco n   the prime factors and exponents of n
 3 pco n   the prime factorization of n
¯4 pco n   the next prime < n
 4 pco n   the next prime > n
10 pco r,s a boolean vector b such that r+b/⍳⍴b are the primes in the half-open interval [r,s).

      ¯1 pco 10*⍳10      ⍝ the number of primes < 1e0 1e1 ... 1e9
0 4 25 168 1229 9592 78498 664579 5761455 50847534

      +/ 10 pco 0 1e9    ⍝ sum of the sieve between 0 and 1e9
50847534
                         ⍝ sum of sums of 10 sieves
                         ⍝ each of size 1e8 from 0 to 1e9
      +/ t← {+/10 pco ⍵+0 1e8}¨ 1e8×⍳10
50847534
                         ⍝ sum of sums of 1000 sieves
                         ⍝ each of size 1e6 from 0 to 1e9
      +/ s← {+/10 pco ⍵+0 1e6}¨ 1e6×⍳1000
50847534

      t ≡ +/ 10 100 ⍴ s
1
                         ⍝ sum of sums of sieves with 1000 randomly
                         ⍝ chosen end-points, from 0 to 1e9
      +/ 2 {+/10 pco ⍺,⍵}/ 0,({⍵[⍋⍵]}1000?1e9),1e9
50847534

      ¯1 pco 1e9         ⍝ the number of primes < 1e9
50847534
      pco 50847534       ⍝ the 50847534-th prime
1000000007
      ¯4 pco 1000000007  ⍝ the next prime < 1000000007
999999937
      4 pco 999999937    ⍝ the next prime > 999999937
1000000007     
      4 pco 1e9          ⍝ the next prime > 1e9
1000000007

                         ⍝ are 999999937 1000000007 primes?
      1 pco 999999937 1000000007
1 1
                         ⍝ which of 999999930 ... 1000000009
                         ⍝ are prime?
      1 pco 999999930+4 20⍴⍳80
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1

      +/∘sieve¨ 999999937 1000000007 ∘.+ ¯1 0 1
50847533 50847533 50847534
50847534 50847534 50847535

Constructions

Suki Tekverk, a summer intern, spent the last two weeks here studying APL. She will be a high school senior next September and is adept in math, so in addition to APL we also considered some math problems, proofs, and proof techniques, including the following:

Given line segments x and y, construct (using compass and straight-edge) line segments for the following values:
   x+y
   x-y
   x×y
   x÷y

The first two are immediate. Constructing the last two are straightforward if you are also given 1 (or some other reference length from which to construct 1). Can you construct x×y and x÷y without 1?

Constructing x÷y is impossible if you are not given 1 : From x and y alone you can not determine how they compare to 1. If you can construct x÷y, then you can construct x÷x and therefore relate x and y to 1, contradicting the previous statement.

Can you construct x×y without 1? I got stuck (and lazy) and posed the question to the J Chat Forum, and received a solution from Raul Miller in less than half an hour. Miller’s proof recast in terms similar to that for x÷y is as follows:

From x and y alone you can not determine how they compare to 1. If you can construct x×y, then you can construct x×x, whence:
   if x<x×x then x>1
   if x=x×x then x=1
   if x>x×x then x<1
contradicting the previous statement.

I last thought about this problem in my first year of college many decades ago. At the time Norman M. (a classmate) argued that there must be a 1 and then did the construction for x÷y using 1. I recall he said “there must be a 1” in the sense of “1 has to exist if x and y exist” rather than that “you have to use a 1 in the construction” or “you can not construct x÷y without using 1”. I don’t remember what we did with x×y; before Miller’s message I had some doubt that perhaps we were able to construct x×y without using 1 all those years ago. (Norman went on to get a Ph.D. at MIT and other great things.)

A Memo Operator

Abstract

Some functions are simply stated and easily understood as multiply-recursive functions. For example, the Fibonacci numbers are {1≥⍵:⍵ ⋄ (∇ ⍵-2)+∇ ⍵-1}. A drawback of multiple recursion is abysmal performance even for moderate-sized arguments, consequently motivating resorts to circumlocutions. The memo operator addresses the performance problem and restores multiple recursion to the toolkit of thought. We present a memo operator modelled as a 6-line d-operator.

⎕io←0 throughout.

The Man Who Knew Infinity

Jay sent out an e-mail asking who would be interested to go see the film The Man Who Knew Infinity. Fiona’s response: “Ah, Mr. 1729; I’m in.” John was game, and so was I.

The Number of Partitions
Nick found an essay by Stephen Wolfram on Ramanujan written on occasion of the film, wherein Wolfram said:

A notable paper [Ramanujan] wrote with Hardy concerns the partition function (PartitionsP in the Wolfram Language) that counts the number of ways an integer can be written as a sum of positive integers. …

In Ramanujan’s day, computing the exact value of PartitionsP[200] was a big deal — and the climax of his paper. But now, thanks to Ramanujan’s method, it’s instantaneous:

In[1]:= PartitionsP[200]
Out[1]= 3 972 999 029 388

A partition of a non-negative integer n is a vector v of positive integers such that n = +/v, where the order in v is not significant. For example, the partitions of the first seven non-negative integers are:

┌┐
││
└┘
┌─┐
│1│
└─┘
┌─┬───┐
│2│1 1│
└─┴───┘
┌─┬───┬─────┐
│3│2 1│1 1 1│
└─┴───┴─────┘
┌─┬───┬───┬─────┬───────┐
│4│3 1│2 2│2 1 1│1 1 1 1│
└─┴───┴───┴─────┴───────┘
┌─┬───┬───┬─────┬─────┬───────┬─────────┐
│5│4 1│3 2│3 1 1│2 2 1│2 1 1 1│1 1 1 1 1│
└─┴───┴───┴─────┴─────┴───────┴─────────┘
┌─┬───┬───┬─────┬───┬─────┬───────┬─────┬───────┬─────────┬───────────┐
│6│5 1│4 2│4 1 1│3 3│3 2 1│3 1 1 1│2 2 2│2 2 1 1│2 1 1 1 1│1 1 1 1 1 1│
└─┴───┴───┴─────┴───┴─────┴───────┴─────┴───────┴─────────┴───────────┘

Can we compute the partition counting function “instantaneously” in APL?

We will use a recurrence relation derived from the pentagonal number theorem, proved by Euler more than 160 years before Hardy and Ramanujan:

equation (11) in http://mathworld.wolfram.com/PartitionFunctionP.html. In APL:

      pn  ← {0≥⍵:0=⍵ ⋄ -/+⌿∇¨rec ⍵}
      rec ← {⍵ - (÷∘2 (×⍤1) ¯1 1 ∘.+ 3∘×) 1+⍳⌈0.5*⍨⍵×2÷3}

      pn¨0 1 2 3 4 5 6
1 1 2 3 5 7 11

      pn 30
5604

Warning: don’t try pn 200 because that would take a while! Why? pn 200 would engender pn being applied to each element of rec 200 and:

   rec 200
199 195 188 178 165 149 130 108 83 55 24 ¯10
198 193 185 174 160 143 123 100 74 45 13 ¯22

Each of the non-negative values would itself engender further recursive applications.

A Memo Operator

I recalled from J the memo adverb (monadic operator) M. Paraphrasing the J dictionary: f M is the same as f but may keep a record of the arguments and results for reuse. It is commonly used for multiply-recursive functions. Sounds like just the ticket!

The functionistas had previously implemented a (dyadic) memo operator. The version here is more restrictive but simpler. The restrictions are as follows:

  • The cache is constructed “on the fly” and is discarded once the operand finishes execution.
  • The operand is a dfn {condition:basis ⋄ expression} where none of condition, base and expression contain a and recursion is denoted by in expression
  • The arguments are scalar integers.
  • The operand function does not have ¯1 as a result.
  • A recursive call is on a smaller integer.

With these restrictions, our memo operator can be defined as follows:

M←{
 f←⍺⍺
 i←2+'⋄'⍳⍨t←3↓,⎕CR'f'
 0=⎕NC'⍺':⍎' {T←(1+  ⍵)⍴¯1 ⋄  {',(i↑t),'¯1≢T[⍵]  :⊃T[⍵]   ⋄ ⊃T[⍵]  ←⊂',(i↓t),'⍵}⍵'
          ⍎'⍺{T←(1+⍺,⍵)⍴¯1 ⋄ ⍺{',(i↑t),'¯1≢T[⍺;⍵]:⊃T[⍺;⍵] ⋄ ⊃T[⍺;⍵]←⊂',(i↓t),'⍵}⍵'
}

      pn M 10
42

      pn 10
42

Obviously, the lines in M are crucial. When the operand is pn, the expression which is executed is as follows; the characters inserted by M are shown in red.

{T←(1+⍵)⍴¯1 ⋄ {0≥⍵:0=⍵ ⋄ ¯1≢T[⍵]:⊃T[⍵] ⋄ ⊃T[⍵]←⊂-/+⌿∇¨rec ⍵}⍵}

The machinations produce a worthwhile performance improvement:

      pn M 30
5604
      cmpx 'pn M 30'
3.94E¯4
      cmpx 'pn 30'
4.49E1 
      4.49e1 ÷ 3.94e¯4
113959

That is, pn M 30 is faster than pn 30 by a factor of 114 thousand.

Can APL compute pn M 200 “instantaneously”? Yes it can, for a suitable definition of “instantaneous”.

      cmpx 'm←pn M 200'
4.47E¯3
      m
3.973E12
      0⍕m
 3972999029388

M also works on operands which are anonymous, dyadic, or have non-scalar results.

Fibonacci Numbers

   fib←{1≥⍵:⍵ ⋄ (∇ ⍵-2)+∇ ⍵-1}

   fib¨ ⍳15
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377
   fib M¨ ⍳15
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377

   cmpx 'fib 30'
1.45E0 
   cmpx 'fib M 30'
1.08E¯4

Anonymous function:

      {1≥⍵:⍵ ⋄ (∇ ⍵-2)+∇ ⍵-1}M¨ ⍳15
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377

Combinations
⍺ comb ⍵ produces a matrix of all the size- combinations of ⍳⍵. The algorithm is from the venerable APL: An Interactive Approach by Gilman and Rose.

      comb←{(⍺≥⍵)∨0=⍺:((⍺≤⍵),⍺)⍴⍳⍺ ⋄ (0,1+(⍺-1)∇ ⍵-1)⍪1+⍺ ∇ ⍵-1}

      3 comb 5
0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4

      3 (comb ≡ comb M) 5
1

      cmpx '8 comb M 16' '8 comb 16'
  8 comb M 16 → 1.00E¯3 |     0% ⎕                             
  8 comb 16   → 3.63E¯2 | +3526% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

A faster comb is possible without a memo operator, but it’s not easy. (See for example section 4.1 of my APL87 paper.) A memo operator addresses the performance problem with multiple recursion and restores it to the toolkit of thought.

Presented at the BAA seminar at the Royal Society of Arts on 2016-05-20