Calculation v Look-Up

(⎕io←0 and timings are done in Dyalog version 15.0.)

Table Look-Up

Some functions can be computed faster by table look-up than a more traditional and more conventional calculation. For example:

      b←?1e6⍴2

      cmpx '*b' '(*0 1)[b]'
 *b        → 1.32E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 (*0 1)[b] → 1.23E¯3 | -91% ⎕⎕⎕

Some observations about this benchmark:

(a) The advantage of table look-up depends on the time to calculate the function directly, versus the time to do indexing, ⍺[⍵] or ⍵⌷⍺, plus a small amount of time to calculate the function on a (much) smaller representative set.

Indexing is particularly efficient when the indices are boolean:

      bb←b,1
      bi←b,2
      cmpx '2 3[bb]' '2 3 3[bi]'
 2 3[bb]   → 1.12E¯4 |    0% ⎕⎕⎕⎕⎕
 2 3 3[bi] → 7.39E¯4 | +558% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

bi is the same as bb except that the last element of 2 forces it to be 1-byte integers (for bi) instead of 1-bit booleans (for bb).

(b) Although the faster expression works best with 0-origin, the timing for 1-origin is similar.

      cmpx '*b' '{⎕io←0 ⋄ (*0 1)[⍵]}b'
 *b                   → 1.32E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 {⎕io←0 ⋄ (*0 1)[⍵]}b → 1.22E¯3 | -91% ⎕⎕⎕

That is, if the current value of ⎕io is 1, the implementation can use a local ⎕io setting of 0.

(c) The fact that *b is currently slower than (*0 1)[b] represents a judgment by the implementers that *b is not a sufficiently common computation to warrant writing faster code for it. Other similar functions already embody the table look-up technique:

      cmpx '⍕b' '¯1↓,(2 2⍴''0 1 '')[b;]'
 ⍕b                   → 6.95E¯4 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 ¯1↓,(2 2⍴'0 1 ')[b;] → 6.92E¯4 | -1% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Small Domains

Table look-up works best for booleans, but it also works for other small domains such as 1-byte integers:

      i1←?1e6⍴128    ⍝ 1-byte integers

      cmpx '*i1' '(*⍳128)[i1]'
 *i1         → 1.48E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 (*⍳128)[i1] → 9.30E¯4 | -94% ⎕⎕

      cmpx '○i1' '(○⍳128)[i1]'
 ○i1         → 2.78E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 (○⍳128)[i1] → 9.25E¯4 | -67% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

In general, a table for 1-byte integers needs to have values for ¯128+⍳256. Here ⍳128 is used to simulate that in C indexing can be done on 1-byte indices without extra computation on the indices. In APL, general 1-byte indices are used as table[i1+128]; in C, the expression is (table+offset)[i].

Domains considered “small” get bigger all the time as machines come equipped with ever larger memories.

Larger Domains

Table look-up is applicable when the argument is not a substantial subset of a large domain and there is a fast way to detect that it is not.

      i4s←1e7+?1e6⍴1e5    ⍝ small-range 4-byte integers

      G←{(⊂u⍳⍵)⌷⍺⍺ u←∪⍵}

      cmpx '⍟i4s' '⍟G i4s'
 ⍟i4s   → 1.44E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 ⍟G i4s → 9.59E¯3 | -34% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The definition of G implements that the argument is not used directly as indices (as in (⊂⍵)⌷⍺⍺ u) but needed to be looked up, the u⍳⍵ in (⊂u⍳⍵)⌷⍺⍺ u. Thus both index-of and indexing are key enablers for making table look-up competitive.

§4.3 of Notation as a Tool of Thought presents a different way to distribute the results of a calculation on the “nub” (unique items). But it takes more time and space and is limited to numeric scalar results.

      H←{(⍺⍺ u)+.×(u←∪⍵)∘.=⍵}

      i4a←1e7+?1e4⍴1e3    ⍝ small-range 4-byte integers

      cmpx '⍟i4a' '⍟G i4a' '⍟H i4a'
 ⍟i4a   → 1.56E¯4 |       0%
 ⍟G i4a → 6.25E¯5 |     -60%
 ⍟H i4a → 1.78E¯1 | +113500% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The benchmark is done on the smaller i4a as a benchmark on i4s founders on the ≢∪⍵ by ≢⍵ boolean matrix (for i4s, 12.5 GB = ×/ 0.125 1e5 1e6) created by H.

Mathematical Background

New Math” teaches that a function is a set of ordered pairs whose first items are unique:

*⍵: {(⍵,*⍵)|⍵∊C}      The set of all (⍵,*⍵) where is a complex number
⍟⍵: {(⍵,⍟⍵)|⍵∊C~{0}}  The set of all (⍵,⍟⍵) where is a non-zero complex number
○⍵: {(⍵,○⍵)|⍵∊C}      The set of all (⍵,○⍵) where is a complex number
⍕⍵: {(⍵,⍕⍵)|0=⍴⍴⍵}    The set of all (⍵,⍕⍵) where is a scalar

When is restricted (for example) to the boolean domain, the functions, the sets of ordered pairs, are more simply represented by enumerating all the possibilities:

*⍵: {(0,1), (1,2.71828)}
⍟⍵: {(0,Err), (1,0)}
○⍵: {(0,0), (1,3.14159)}
⍕⍵: {(0,'0'), (1,'1')}

Realized and Potential Performance Improvements

realized (available no later than 15.0)

boolean
1-byte integer
⍕ ∘.f
⍕     a∘⍕

potential (non-exhaustive list)

boolean
1-byte integer
2-byte integer
small-range 4-byte integer
* ⍟ | - ○   ! ⍳ a∘f f.g
* ⍟ | - ○ × ! ⍳ a∘f f.g ∘.f
* ⍟ | - ○ ×   ⍳ a∘f
* ⍟ |     ×

a is a scalar; f and g are primitive scalar dyadic functions.

APL Exercises

These exercises are designed to introduce APL to a high school senior adept in mathematics. The entire set of APL Exercises has the following sections:

Introduction
0. Beginnings
1. Utilities
2. Recursion
3. Proofs
4. Rank Operator
5. Index-Of
6. Key Operator
7. Indexing
8. Grade and Sort
9. Power Operator
10. Arithmetic
11. Combinatorial Objects
The Fine Print

Sections 2 and 10 are the most interesting ones and are as follows. The full set is too lengthy (and too boring?) to be included here. Some students may be able to make up their own exercises from the list of topics. As well, it may be useful to know what the “experts” think are important aspects of APL.

2. Recursion

⎕io←0 throughout.

2.0 Factorial

Write a function to compute the factorial function on integer , ⍵≥0.

      fac¨ 5 6 7 8
120 720 5040 40320

      n←1+?20
      (fac n) = n×fac n-1
1
      fac 0
1

2.1 Fibonacci Numbers

Write a function to compute the -th Fibonacci number, ⍵≥0.

      fib¨ 5 6 7 8
5 8 13 21    

      n←2+?20
      (fib n) = (fib n-2) + (fib n-1)
1
      fib 0
0
      fib 1
1

If the function written above is multiply recursive, write a version which is singly recursive (only one occurrence of ). Use cmpx to compare the performance of the two functions on the argument 35.

2.2 Peano Addition

Let and be natural numbers (non-negative integers). Write a function padd to compute ⍺+⍵ without using + itself (or - or × or ÷ …). The only functions allowed are comparison to 0 and the functions pre←{⍵-1} and suc←{⍵+1}.

2.3 Peano Multiplication

Let and be natural numbers (non-negative integers). Write a function ptimes to compute ⍺×⍵ without using × itself (or + or - or ÷ …). The only functions allowed are comparison to 0 and the functions pre←{⍵-1} and suc←{⍵+1} and padd.

2.4 Binomial Coefficients

Write a function to produce the binomal coefficients of order , ⍵≥0.

      bc 0
1
      bc 1
1 1
      bc 2
1 2 1
      bc 3
1 3 3 1
      bc 4
1 4 6 4 1

2.5 Cantor Set

Write a function to compute the Cantor set of order , ⍵≥0.

      Cantor 0
1
      Cantor 1
1 0 1
      Cantor 2
1 0 1 0 0 0 1 0 1
      Cantor 3
1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1

The classical version of the Cantor set starts with the interval [0,1] and at each stage removes the middle third from each remaining subinterval:

[0,1] →
[0,1/3] ∪ [2/3,1] →
[0,1/9] ∪ [2/9,1/3] ∪ [2/3,7/9] ∪ [8/9,1] → ….

      {(+⌿÷≢)Cantor ⍵}¨ 3 5⍴⍳15
1         0.666667  0.444444   0.296296   0.197531  
0.131687  0.0877915 0.0585277  0.0390184  0.0260123 
0.0173415 0.011561  0.00770735 0.00513823 0.00342549

      (2÷3)*3 5⍴⍳15
1         0.666667  0.444444   0.296296   0.197531  
0.131687  0.0877915 0.0585277  0.0390184  0.0260123 
0.0173415 0.011561  0.00770735 0.00513823 0.00342549

2.6 Sierpinski Carpet

Write a function to compute the Sierpinski Carpet of order , ⍵≥0.

      SC 0
1
      SC 1
1 1 1
1 0 1
1 1 1
      SC 2
1 1 1 1 1 1 1 1 1
1 0 1 1 0 1 1 0 1
1 1 1 1 1 1 1 1 1
1 1 1 0 0 0 1 1 1
1 0 1 0 0 0 1 0 1
1 1 1 0 0 0 1 1 1
1 1 1 1 1 1 1 1 1
1 0 1 1 0 1 1 0 1
1 1 1 1 1 1 1 1 1

      ' ⌹'[SC 3]
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹
⌹ ⌹   ⌹ ⌹⌹ ⌹   ⌹ ⌹⌹ ⌹   ⌹ ⌹
⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹         ⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹ ⌹⌹ ⌹⌹ ⌹         ⌹ ⌹⌹ ⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹         ⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹   ⌹⌹⌹         ⌹⌹⌹   ⌹⌹⌹
⌹ ⌹   ⌹ ⌹         ⌹ ⌹   ⌹ ⌹
⌹⌹⌹   ⌹⌹⌹         ⌹⌹⌹   ⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹         ⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹ ⌹⌹ ⌹⌹ ⌹         ⌹ ⌹⌹ ⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹         ⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹
⌹ ⌹   ⌹ ⌹⌹ ⌹   ⌹ ⌹⌹ ⌹   ⌹ ⌹
⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹⌹⌹⌹   ⌹⌹⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹
⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹⌹ ⌹
⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹⌹

3-D analogs of the Cantor set and the Sierpinski carpet are the Sierpinski sponge or the Menger sponge.

2.7 Extended H

Write a function for the extend H algorithm for , ⍵≥0, which embeds the complete binary tree of depth on the plane. In ×H ⍵ there are 2*⍵ leaves, instances of the letter o with exactly one neighbor (or no neighbors for 0=⍵). The root is at the center of the matrix.

      xH¨ ⍳5
┌─┬─────┬─────┬─────────────┬─────────────┐
│o│o-o-o│o   o│o-o-o   o-o-o│o   o   o   o│
│ │     │|   |│  |       |  │|   |   |   |│
│ │     │o-o-o│  o---o---o  │o-o-o   o-o-o│
│ │     │|   |│  |       |  │| | |   | | |│
│ │     │o   o│o-o-o   o-o-o│o | o   o | o│
│ │     │     │             │  |       |  │
│ │     │     │             │  o---o---o  │
│ │     │     │             │  |       |  │
│ │     │     │             │o | o   o | o│
│ │     │     │             │| | |   | | |│
│ │     │     │             │o-o-o   o-o-o│
│ │     │     │             │|   |   |   |│
│ │     │     │             │o   o   o   o│
└─┴─────┴─────┴─────────────┴─────────────┘

Write a function that has the same result as xH but checks the result using the assert utility. For example:

assert←{⍺←'assertion failure' ⋄ 0∊⍵:⍺ ⎕SIGNAL 8 ⋄ shy←0}

xH1←{
  h←xH ⍵
  assert 2=⍴⍴h:
  assert h∊'○ -|':
  assert (¯1+2*1+⍵)=+/'o'=,h:
  ...
  h
}

2.8 Tower of Hanoi

(By André Karwath aka Aka (Own work) [CC BY-SA 2.5 (http://creativecommons.org/licenses/by-sa/2.5)], via Wikimedia Commons)

The Tower of Hanoi problem is to move a set of different-sized disks from one peg to another, moving one disk at a time, using an intermediate peg if necessary. At all times no larger disk may sit on top of a smaller disk.

Write a function Hanoi to solve the problem of moving disks from peg 0 to peg 2. Since it’s always the disk which is at the top of a peg which is moved, the solution can be stated as a 2-column matrix with column 0 indicating the source peg and column 1 the destination peg.

      Hanoi¨ ⍳5
┌───┬───┬───┬───┬───┐
│   │0 2│0 1│0 2│0 1│
│   │   │0 2│0 1│0 2│
│   │   │1 2│2 1│1 2│
│   │   │   │0 2│0 1│
│   │   │   │1 0│2 0│
│   │   │   │1 2│2 1│
│   │   │   │0 2│0 1│
│   │   │   │   │0 2│
│   │   │   │   │1 2│
│   │   │   │   │1 0│
│   │   │   │   │2 0│
│   │   │   │   │1 2│
│   │   │   │   │0 1│
│   │   │   │   │0 2│
│   │   │   │   │1 2│
└───┴───┴───┴───┴───┘

Prove that Hanoi ⍵ has ¯1+2*⍵ rows.

10. Arithmetic

⎕io←0 throughout.

10.0 Primality Testing

Write a function prime ⍵ which is 1 or 0 depending on whether non-negative integer is a prime number. For example:

      prime 1
0
      prime 2
1
      prime 9
0
      prime¨ 10 10⍴⍳100
0 0 1 1 0 1 0 1 0 0
0 1 0 1 0 0 0 1 0 1
0 0 0 1 0 0 0 0 0 1
0 1 0 0 0 0 0 1 0 0
0 1 0 1 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 1
0 1 0 0 0 0 0 1 0 0
0 1 0 1 0 0 0 0 0 1
0 0 0 1 0 0 0 0 0 1
0 0 0 0 0 0 0 1 0 0

10.1 Primes Less Than n

Write a function primes ⍵ using the sieve method which produces all the primes less than . Use cmpx to compare the speed of primes n and (prime¨⍳n)/⍳n. For example:

      primes 7
2 3 5
      primes 50
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47

      (prime¨⍳50)/⍳50
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47

10.2 Factoring

Write a function factor which produces the prime factorization of , such that ⍵ = ×/factor ⍵ and ∧/ prime¨ factor ⍵ is 1. For example:

      factor 144
2 2 2 2 3 3
      ×/ factor 144
144
      ∧/ prime¨ factor 144
1

      factor 1

      factor 2
2

10.3 pco

Read about the pco function in http://dfns.dyalog.com/n_pco.htm and experiment with it.

      )copy dfns pco

      pco ⍳7
2 3 5 7 11 13 17

      1 pco ⍳10
0 0 1 1 0 1 0 1 0 0

G.H. Hardy states in A Mathematician’s Apology (chapter 14, page 23) that the number of primes less than 1e9 is 50847478. Use pco to check whether this is correct.

10.4 GCD

Write a function ⍺ gcd ⍵ to compute the greatest common divisor of non-negative integers and using the Euclidean algorithm.

Write a function ⍺ egcd ⍵ to compute the GCD of and as coefficients of and , such that (⍺ gcd ⍵) = (⍺,⍵)+.×⍺ egcd ⍵.

      112 gcd 144
16
      112 egcd 144
4 ¯3
      112 144 +.× 112 egcd 144
16

10.5 Ring Extension

Z[√k] is the ring of integers Z extended with √k where k is not a perfect square. The elements of Z[√k] are ordered pairs (a,b) which can be interpreted as a+b×√k (a+b×k*0.5).

Write a d-operator ⍺(⍺⍺ rplus)⍵ which does addition in Z[√⍺⍺].

      3 4 (5 rplus) ¯2 7
1 11

Write a d-operator ⍺(⍺⍺ rtimes)⍵ which does multiplication in Z[√⍺⍺].

      3 4 (5 rtimes) ¯2 7
134 13

Both of the above can be done using integer operations only.

10.6 Repeated Squaring I

Write a function ⍺ pow ⍵ that computes ⍺*⍵ using repeated squaring. is a number and a non-negative integer. Hint: the binary representation of an integer n obtains as 2⊥⍣¯1⊢n.

10.7 Repeated Squaring II

Write a d-operator ⍺(⍺⍺ rpow)⍵ that computes raised to the power using repeated squaring. is in Z[√⍺⍺] and is a non-negative integer.

      ⊃ (5 rtimes)/ 13 ⍴ ⊂ 1 1
2134016 954368

      1 1 (5 rpow) 13
2134016 954368
      1 ¯1 (5 rpow) 13
2134016 ¯954368

The last two expressions are key steps in an O(⍟n) computation of the n-th Fibonacci number using integer operations, using Binet’s formula:

50847534

⎕io←0 throughout.

I was re-reading A Mathematician’s Apology before recommending it to Suki Tekverk, our summer intern, and came across a statement that the number of primes less than 1e9 is 50847478 (§14, page 23). The function pco from the dfns workspace does computations on primes; ¯1 pco n is the number of primes less than n:

      )copy dfns pco

      ¯1 pco 1e9
50847534

The two numbers 50847478 and 50847534 cannot both be correct. A search of 50847478 on the internet reveals that it is incorrect and that 50847534 is correct. 50847478 is erroneously cited in various textbooks and even has a name, Bertelsen’s number, memorably described by MathWorld as “an erroneous name erroneously given the erroneous value of π(1e9) = 50847478.”

Although several internet sources give 50847534 as the number of primes less than 1e9, they don’t provide a proof. Relying on them would be doing the same thing — arguing from authority — that led other authors astray, even if it is the authority of the mighty internet. Besides, I’d already given Suki, an aspiring mathematician, the standard spiel about the importance of proof.

How do you prove the 50847534 number? One way is to prove correct a program that produces it. Proving pco correct seems daunting — it has 103 lines and two large tables. Therefore, I wrote a function from scratch to compute the number of primes less than , a function written in a way that facilitates proof.

sieve←{
  b←⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5
  b[⍳6⌊⍵]←(6⌊⍵)⍴0 0 1 1 0 1
  49≥⍵:b
  p←3↓{⍵/⍳⍴⍵}∇⌈⍵*0.5
  m←1+⌊(⍵-1+p×p)÷2×p
  b ⊣ p {b[⍺×⍺+2×⍳⍵]←0}¨ m
}

      +/ sieve 1e9
50847534

sieve ⍵ produces a boolean vector b with length such that b/⍳⍴b are all the primes less than . It implements the sieve of Eratosthenes: mark as not-prime multiples of each prime less than ⍵*0.5; this latter list of primes obtains by applying the function recursively to ⌈⍵*0.5. Some obvious optimizations are implemented:

  • Multiples of 2 3 5 are marked by initializing b with ⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5 rather than with ⍵⍴1.
  • Subsequently, only odd multiples of primes > 5 are marked.
  • Multiples of a prime to be marked start at its square.

Further examples:

      +/∘sieve¨ ⍳10
0 0 0 1 2 2 3 3 4 4

      +/∘sieve¨ 10*⍳10
0 4 25 168 1229 9592 78498 664579 5761455 50847534

There are other functions which are much easier to prove correct, for example, sieve1←{2=+⌿0=(⍳3⌈⍵)∘.|⍳⍵}. However, sieve1 requires O(⍵*2) space and sieve1 1e9 cannot produce a result with current technology. (Hmm, a provably correct program that cannot produce the desired result …)

We can test that sieve is consistent with pco and that pco is self-consistent. pco is a model of the p: primitive function in J. Its cases are:
   pco n   the n-th prime
¯1 pco n   the number of primes less than n
 1 pco n   1 iff n is prime
 2 pco n   the prime factors and exponents of n
 3 pco n   the prime factorization of n
¯4 pco n   the next prime < n
 4 pco n   the next prime > n
10 pco r,s a boolean vector b such that r+b/⍳⍴b are the primes in the half-open interval [r,s).

      ¯1 pco 10*⍳10      ⍝ the number of primes < 1e0 1e1 ... 1e9
0 4 25 168 1229 9592 78498 664579 5761455 50847534

      +/ 10 pco 0 1e9    ⍝ sum of the sieve between 0 and 1e9
50847534
                         ⍝ sum of sums of 10 sieves
                         ⍝ each of size 1e8 from 0 to 1e9
      +/ t← {+/10 pco ⍵+0 1e8}¨ 1e8×⍳10
50847534
                         ⍝ sum of sums of 1000 sieves
                         ⍝ each of size 1e6 from 0 to 1e9
      +/ s← {+/10 pco ⍵+0 1e6}¨ 1e6×⍳1000
50847534

      t ≡ +/ 10 100 ⍴ s
1
                         ⍝ sum of sums of sieves with 1000 randomly
                         ⍝ chosen end-points, from 0 to 1e9
      +/ 2 {+/10 pco ⍺,⍵}/ 0,({⍵[⍋⍵]}1000?1e9),1e9
50847534

      ¯1 pco 1e9         ⍝ the number of primes < 1e9
50847534
      pco 50847534       ⍝ the 50847534-th prime
1000000007
      ¯4 pco 1000000007  ⍝ the next prime < 1000000007
999999937
      4 pco 999999937    ⍝ the next prime > 999999937
1000000007     
      4 pco 1e9          ⍝ the next prime > 1e9
1000000007

                         ⍝ are 999999937 1000000007 primes?
      1 pco 999999937 1000000007
1 1
                         ⍝ which of 999999930 ... 1000000009
                         ⍝ are prime?
      1 pco 999999930+4 20⍴⍳80
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1

      +/∘sieve¨ 999999937 1000000007 ∘.+ ¯1 0 1
50847533 50847533 50847534
50847534 50847534 50847535

Constructions

Suki Tekverk, a summer intern, spent the last two weeks here studying APL. She will be a high school senior next September and is adept in math, so in addition to APL we also considered some math problems, proofs, and proof techniques, including the following:

Given line segments x and y, construct (using compass and straight-edge) line segments for the following values:
   x+y
   x-y
   x×y
   x÷y

The first two are immediate. Constructing the last two are straightforward if you are also given 1 (or some other reference length from which to construct 1). Can you construct x×y and x÷y without 1?

Constructing x÷y is impossible if you are not given 1 : From x and y alone you can not determine how they compare to 1. If you can construct x÷y, then you can construct x÷x and therefore relate x and y to 1, contradicting the previous statement.

Can you construct x×y without 1? I got stuck (and lazy) and posed the question to the J Chat Forum, and received a solution from Raul Miller in less than half an hour. Miller’s proof recast in terms similar to that for x÷y is as follows:

From x and y alone you can not determine how they compare to 1. If you can construct x×y, then you can construct x×x, whence:
   if x<x×x then x>1
   if x=x×x then x=1
   if x>x×x then x<1
contradicting the previous statement.

I last thought about this problem in my first year of college many decades ago. At the time Norman M. (a classmate) argued that there must be a 1 and then did the construction for x÷y using 1. I recall he said “there must be a 1” in the sense of “1 has to exist if x and y exist” rather than that “you have to use a 1 in the construction” or “you can not construct x÷y without using 1”. I don’t remember what we did with x×y; before Miller’s message I had some doubt that perhaps we were able to construct x×y without using 1 all those years ago. (Norman went on to get a Ph.D. at MIT and other great things.)

A Memo Operator

Abstract

Some functions are simply stated and easily understood as multiply-recursive functions. For example, the Fibonacci numbers are {1≥⍵:⍵ ⋄ (∇ ⍵-2)+∇ ⍵-1}. A drawback of multiple recursion is abysmal performance even for moderate-sized arguments, consequently motivating resorts to circumlocutions. The memo operator addresses the performance problem and restores multiple recursion to the toolkit of thought. We present a memo operator modelled as a 6-line d-operator.

⎕io←0 throughout.

The Man Who Knew Infinity

Jay sent out an e-mail asking who would be interested to go see the film The Man Who Knew Infinity. Fiona’s response: “Ah, Mr. 1729; I’m in.” John was game, and so was I.

The Number of Partitions
Nick found an essay by Stephen Wolfram on Ramanujan written on occasion of the film, wherein Wolfram said:

A notable paper [Ramanujan] wrote with Hardy concerns the partition function (PartitionsP in the Wolfram Language) that counts the number of ways an integer can be written as a sum of positive integers. …

In Ramanujan’s day, computing the exact value of PartitionsP[200] was a big deal — and the climax of his paper. But now, thanks to Ramanujan’s method, it’s instantaneous:

In[1]:= PartitionsP[200]
Out[1]= 3 972 999 029 388

A partition of a non-negative integer n is a vector v of positive integers such that n = +/v, where the order in v is not significant. For example, the partitions of the first seven non-negative integers are:

┌┐
││
└┘
┌─┐
│1│
└─┘
┌─┬───┐
│2│1 1│
└─┴───┘
┌─┬───┬─────┐
│3│2 1│1 1 1│
└─┴───┴─────┘
┌─┬───┬───┬─────┬───────┐
│4│3 1│2 2│2 1 1│1 1 1 1│
└─┴───┴───┴─────┴───────┘
┌─┬───┬───┬─────┬─────┬───────┬─────────┐
│5│4 1│3 2│3 1 1│2 2 1│2 1 1 1│1 1 1 1 1│
└─┴───┴───┴─────┴─────┴───────┴─────────┘
┌─┬───┬───┬─────┬───┬─────┬───────┬─────┬───────┬─────────┬───────────┐
│6│5 1│4 2│4 1 1│3 3│3 2 1│3 1 1 1│2 2 2│2 2 1 1│2 1 1 1 1│1 1 1 1 1 1│
└─┴───┴───┴─────┴───┴─────┴───────┴─────┴───────┴─────────┴───────────┘

Can we compute the partition counting function “instantaneously” in APL?

We will use a recurrence relation derived from the pentagonal number theorem, proved by Euler more than 160 years before Hardy and Ramanujan:

equation (11) in http://mathworld.wolfram.com/PartitionFunctionP.html. In APL:

      pn  ← {0≥⍵:0=⍵ ⋄ -/+⌿∇¨rec ⍵}
      rec ← {⍵ - (÷∘2 (×⍤1) ¯1 1 ∘.+ 3∘×) 1+⍳⌈0.5*⍨⍵×2÷3}

      pn¨0 1 2 3 4 5 6
1 1 2 3 5 7 11

      pn 30
5604

Warning: don’t try pn 200 because that would take a while! Why? pn 200 would engender pn being applied to each element of rec 200 and:

   rec 200
199 195 188 178 165 149 130 108 83 55 24 ¯10
198 193 185 174 160 143 123 100 74 45 13 ¯22

Each of the non-negative values would itself engender further recursive applications.

A Memo Operator

I recalled from J the memo adverb (monadic operator) M. Paraphrasing the J dictionary: f M is the same as f but may keep a record of the arguments and results for reuse. It is commonly used for multiply-recursive functions. Sounds like just the ticket!

The functionistas had previously implemented a (dyadic) memo operator. The version here is more restrictive but simpler. The restrictions are as follows:

  • The cache is constructed “on the fly” and is discarded once the operand finishes execution.
  • The operand is a dfn {condition:basis ⋄ expression} where none of condition, base and expression contain a and recursion is denoted by in expression
  • The arguments are scalar integers.
  • The operand function does not have ¯1 as a result.
  • A recursive call is on a smaller integer.

With these restrictions, our memo operator can be defined as follows:

M←{
 f←⍺⍺
 i←2+'⋄'⍳⍨t←3↓,⎕CR'f'
 0=⎕NC'⍺':⍎' {T←(1+  ⍵)⍴¯1 ⋄  {',(i↑t),'¯1≢T[⍵]  :⊃T[⍵]   ⋄ ⊃T[⍵]  ←⊂',(i↓t),'⍵}⍵'
          ⍎'⍺{T←(1+⍺,⍵)⍴¯1 ⋄ ⍺{',(i↑t),'¯1≢T[⍺;⍵]:⊃T[⍺;⍵] ⋄ ⊃T[⍺;⍵]←⊂',(i↓t),'⍵}⍵'
}

      pn M 10
42

      pn 10
42

Obviously, the lines in M are crucial. When the operand is pn, the expression which is executed is as follows; the characters inserted by M are shown in red.

{T←(1+⍵)⍴¯1 ⋄ {0≥⍵:0=⍵ ⋄ ¯1≢T[⍵]:⊃T[⍵] ⋄ ⊃T[⍵]←⊂-/+⌿∇¨rec ⍵}⍵}

The machinations produce a worthwhile performance improvement:

      pn M 30
5604
      cmpx 'pn M 30'
3.94E¯4
      cmpx 'pn 30'
4.49E1 
      4.49e1 ÷ 3.94e¯4
113959

That is, pn M 30 is faster than pn 30 by a factor of 114 thousand.

Can APL compute pn M 200 “instantaneously”? Yes it can, for a suitable definition of “instantaneous”.

      cmpx 'm←pn M 200'
4.47E¯3
      m
3.973E12
      0⍕m
 3972999029388

M also works on operands which are anonymous, dyadic, or have non-scalar results.

Fibonacci Numbers

   fib←{1≥⍵:⍵ ⋄ (∇ ⍵-2)+∇ ⍵-1}

   fib¨ ⍳15
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377
   fib M¨ ⍳15
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377

   cmpx 'fib 30'
1.45E0 
   cmpx 'fib M 30'
1.08E¯4

Anonymous function:

      {1≥⍵:⍵ ⋄ (∇ ⍵-2)+∇ ⍵-1}M¨ ⍳15
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377

Combinations
⍺ comb ⍵ produces a matrix of all the size- combinations of ⍳⍵. The algorithm is from the venerable APL: An Interactive Approach by Gilman and Rose.

      comb←{(⍺≥⍵)∨0=⍺:((⍺≤⍵),⍺)⍴⍳⍺ ⋄ (0,1+(⍺-1)∇ ⍵-1)⍪1+⍺ ∇ ⍵-1}

      3 comb 5
0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4

      3 (comb ≡ comb M) 5
1

      cmpx '8 comb M 16' '8 comb 16'
  8 comb M 16 → 1.00E¯3 |     0% ⎕                             
  8 comb 16   → 3.63E¯2 | +3526% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

A faster comb is possible without a memo operator, but it’s not easy. (See for example section 4.1 of my APL87 paper.) A memo operator addresses the performance problem with multiple recursion and restores it to the toolkit of thought.

Presented at the BAA seminar at the Royal Society of Arts on 2016-05-20

Shuffle Faster Please!

Andy reported that in the shuffle QA some functions take a long time:

m9249   “4½ days so far”
rankop  21.5 hours
m12466  26.3 hours
points  7.2 hours

Background: Shuffle QA is an intensification of the normal QA. The suite of over 1800 QA functions is run under shuffle, whereby every getspace (memory allocation) is followed by every APL array in the workspace being shifted (“shuffled”) and by a simple workspace integrity check. The purpose is to uncover memory reads/writes which are rendered unsafe by events which in normal operations are rare.

m9249

m9249 tests +\[a], ×\[a], ⌈\[a], and ⌊\[a]. It ran in under 3 seconds in normal operations; for it to take more than 4½ days under shuffle, getspace is clearly the dominant resource and hereafter we focus on that.

m9249 used expressions like:

assert (+⍀x) ≡ {⍺+⍵}⍀x

Since {⍺+⍵} is not recognized by the scan operator and therefore not supported by special code, {⍺+⍵}⍀x is done by the general O(n*2) algorithm for scans, AND on each scalar the operand {⍺+⍵} is invoked and a getspace is done. The improved, faster QA does this:

F ← {1≥≢⍵:⍵ ⋄ x ⍪ (¯1↑⍵) ⍺⍺⍨ ¯1↑x←⍺⍺ ∇∇ ¯1↓⍵}
assert (+⍀x) ≡ +F x

With such a change, the faster m9249 runs in 22 minutes under shuffle instead of over 4½ days, without any reduction in coverage.

The number of getspace are as follows: +⍀x takes O(1). The second expression {⍺+⍵}⍀x does ×\1↓⍴x separate scans, and each one takes O((≢x)*2) getspace; the total is therefore O(((≢x)*2)××/1↓⍴x) ←→ O((≢x)××/⍴x). The third expression +F is linear in ≢x and is therefore O(≢x) in the number of getspace. In m9249 the largest x has size 33 5 7. The following table shows the orders of complexity and what they imply for this largest x:

+⍀x O(1) 1
{⍺+⍵}⍀x O((≢x)××/⍴x) 38115 = 33 × ×/ 33 5 7
+F x O(≢x) 33

The ratio between 38115 and 33 goes a long way towards explaining why the time to run m9249 under shuffle was over 4½ days and is now 22 minutes. (Why does m9249 still take 22 minutes? It tests for the four functions + × ⌈ ⌊, for various axes, for different datatypes, and for different axis lengths.)

rankop

Another shuffle sluggard rankop took 47 hours. rankop tests expressions of the form f⍤r⊢yand x f⍤r⊢y for various functions f. The techniques used for reducing the number of getspace differ from function to function. We look at x↑⍤r⊢y as an illustration.

assert (4↑⍤1⊢y) ≡ 4{⍺↑⍵}⍤1⊢y

⍴y was 2 7 1 8 2 8. The expression took 7.6e¯5 seconds normally and 7.5 seconds under shuffle. The left hand side requires O(1) getspace, the right hand side O(×/¯1↓⍴x). The expression was changed to:

assert (4↑⍤1⊢y) ≡ 2 7 1 8 2 4↑y

Feels like cheating, doesn’t it? :-) The new expression takes 0.115 seconds under shuffle.

m12466

m12466 tests n⌈/[a]x (and n⌊/[a]x). Previously, it did this by comparing n⌈/[a]x against n{⍺⌈⍵}/[a]x for x with shape 7 17 13 11, for each of the four axes, for two different values of n, and for the 1-, 2-, 4-, and 8-byte numeric datatypes. It required 5238426 getspace and 26.3 hours to run under shuffle.

F←{p⍉⍺⍺⌿(⊂(⎕io-⍨⍳|⍺)∘.+⍳(1-|⍺)+(⍴⍵)[⍵⍵])⌷(⍋p←⍒⍵⍵=⍳⍴⍴⍵)⍉⍵}

F is a dyadic operator. The left operand ⍺⍺ is or ; the right operand ⍵⍵ is the axis; and n(⌈ F a)x is the same as n⌈/[a]x. n(⌈ F a)x mainly does n⌈⌿x; other axes are handled by transposing the axis of interest to be the first, then doing n⌈⌿x, then transposing the axes back into the required order. n(⌈ F a)x is much more complicated than n{⍺⌈⍵}/[a]x but has the advantage of using only O(1) getspace.

The revamped m12466 function uses 13717 getspace and 2 minutes to run under shuffle.

points

points is a function to test monadic format () for different values of ⎕pp.

points←{⍺←17 ⋄ ⎕pp←⍺
  tt←{(⌽∨\⌽⍵≠⍺)/⍵}                   ⍝ trim trailing ⍺s.    
  ~0∊∘.{                             ⍝ all pairs (⍳⍵)       
    num←'0'tt↑,/⍕¨⍺'.'⍵              ⍝ eg '9.3'             
    num≡⍕⍎num                        ⍝ check round trip     
  }⍨⍳⍵
}

The expression is 14 15 16 17 points¨ 100, meaning that for each of the 100 numbers num as text,

  1.1   1.2   1.3   1.4   1.5     1.100
  2.1   2.2   2.3   2.4   2.5     2.100
  3.1   3.2   3.3   3.4   3.5 ... 3.100
...
 99.1  99.2  99.3  99.4  99.5    99.100
100.1 100.2 100.3 100.4 100.5   100.100

a “round-trip” num≡⍕⍎num is evaluated. The expression requires 1.3 million getspace and 7.2 hours to run under shuffle.

A much more efficient version with respect to getspace is possible. Let x be a vector. ⍕¨x requires O(≢x) getspace; ⍕x requires O(1) getspace and, like ⍕¨x, formats each individual element of x independently.

points1←{
  ⎕io←⎕ml←1                                                 
  tt←{(⌽∨\⌽⍵≠⍺)/⍵}                   ⍝ trim trailing ⍺s     
  x←1↓∊(' ',¨s,¨'.')∘.,'0'tt¨s←⍕¨⍳⍵  ⍝ numbers as text      
  y←⍎x                                                      
  {⎕pp←⍵ ⋄ x≡⍕y}¨⍺                   ⍝ check round trip     
}

14 15 16 17 points1 100 requires 11050 getspace and less than 2 minutes to run under shuffle.

What to do?

Andy says that the shuffle QA takes over 2000 hours (!). The shuffle sluggards will be tackled as done here by using more parsimonious APL expressions to compare against the aspect being QA-ed. Going forward, a new QA function will be run under shuffle as it is being developed. The developers will be unlikely to tolerate a running time of 4½ days.