Is it Sorted?

Motivation

I have been working on the Dyalog APL quicksort implementation. The following programming puzzle arose in the process of doing the QA for this work.

is a simple array. Write a function sorted, without using or , such that sorted ⍵ is 1 if is sorted in ascending order and 0 otherwise.

The point about not using grade is that this is supposed to be an independent check that grade is correct (remains correct) after the new work.

Real Vectors

The simplest case is when is a numeric vector. If furthermore are not complex numbers (a case addressed later), then

   ∧/ 2≤/⍵

each item being less than or equal to the next one, checks that is sorted. Since uses exact comparisons, here we must set ⎕ct←⎕dct←0. Morever, in order that decimal floating-point numbers (DECFs) be compared correctly, here ⎕fr←1287.

Real Arrays

More generally, when is a non-complex numeric matrix, we must check that each row precedes or is equal to the next row. If c and d are consecutive rows, then corresponding items are compared and at the first item where they differ, c[i] must be less than d[i].

   ~ 0 ∊ (2>⌿⍪⍵) ⍲ <\ 2≠⌿⍪⍵

The expression incorporates two refinements:

  • If is not a matrix, first apply ⍪⍵.
  • Instead of checking c[i] is less than d[i], check that c[i] is not greater than d[i]. This finesses the case where c≡d and there is no first item where they differ; that is, the case where <\2≠⌿⍪⍵ is all 0s for that row.

<\on a boolean vector has 0s after the first 1, (and is all 0 if there are no 1s). Therefore, <\2≠⌿⍪⍵ finds the first item (if any) where one cell differs from the next cell, and that item must not be greater than the corresponding item in the next cell.

For example:

   x←?97 3⍴10

   {~ 0 ∊ (2>⌿⍪⍵) ⍲ <\ 2≠⌿⍪⍵} x
0
   {~ 0 ∊ (2>⌿⍪⍵) ⍲ <\ 2≠⌿⍪⍵} x[⍋x;]
1

(Muse: since x above are random numbers, there is a possibility that it is sorted and the first test above can be 1. But: if each elementary particle in the visible universe were a computer and every nanosecond each of them creates a random matrix and tests it for sortedness as above, running from the beginning of the time to the end of our lives, it is still a very safe bet that no 1 would result.)

For integer arrays, there is an alternative of using the signs of the arithmetic difference between successive cells:

   {~ 0 ∊ 1≠t×<\0≠t← × 2-⌿⍪⍵} x[⍋x;]
1

(The sign where consecutive cells first differ must not be 1.) However, computing the difference on floating point numbers can founder on overflow:

   ⊢ x←¯1 1×⌊/⍬
¯1.79769E308 1.79769E308

   {~ 0 ∊ 1≠t×<\0≠t← × 2-⌿⍪⍵} x
DOMAIN ERROR
   {~0∊1≠t×<\0≠t←×2-⌿⍪⍵}x
                   ∧

Complex Numbers

Two complex numbers are ordered first by the real parts and then by the imaginary parts. (This is part of the TAO extension implemented in Dyalog APL version 17.0.) Therefore, a complex array can be tested for sortedness by testing an equivalent real array with each number replaced by their real and imaginary parts, thus:

   (¯1⌽⍳1+⍴⍴⍵) ⍉ 9 11∘.○⍵
   ↑9 11∘○¨⍵
   9 11○⍤1 0⊢⍵

Although the second expression is the shortest, it is less efficient in time, space, and number of getspace calls. The last expression is favored for its brevity and performance.

The number of getspace is a worthwhile measure. Part of the QA process is a rather stringent procedure called the “Shuffle QA”. The entire Shuffle QA takes several weeks to run and its running time is directly related to the number of getspace.

Character Arrays

None of the functions < ≤ ≥ > - × are permitted on characters. This is solved by application of ⎕ucs, converting characters to integers while preserving the ordering.

Putting It All Together

sorted←{
  ⎕ct←⎕dct←0
  ⎕fr←1287
  d←10|⎕dr ⍵
  d∊0 2: ∇ ⎕ucs ⍵
  d=9:   ∇ 9 11○⍤1 0⊢⍵
  ~ 0 ∊ (2>⌿⍪⍵) ⍲ <\ 2≠⌿⍪⍵
}

Other Considerations

That ⍵⌷⍨⊂⍋⍵ is sorted is a necessary but not sufficient condition that ⍋⍵ is correct. For example, an “adversary” can supply the following results for ⍋⍵ so that ⍵⌷⍨⊂⍋⍵ is sorted:

?≢⍵
(≢⍵)⍴?≢⍵
¯1↓⍋⍵
∊ i {⊂⍵[?⍨≢⍵]}⌸⍨ ⍵⌷⍨⊂i←⍋⍵

The last expression randomly permutes the grade indices of equal cells, a result which violates the requirement that grade indices of equal cells are in ascending order. That is, grade must be stable.

In Dyalog APL version 17.0, grade has been extended to work on non-simple arrays, the much discussed TAO, total array ordering. Checking that a non-simple array is sorted without using grade requires facilities discussed in the paper TAO Axioms and is beyond the scope of this note.

Dyadic Grade

⎕io=0 is assumed throughout. The essay talks only about but the same ideas apply to .

Background

has the distinction of being the first (in 1980) APL primitive function defined on major cells: the result orders items of a vector, rows of a matrix, planes of a 3-d array, etc. In the ordering major cells are compared in ravelled order, with leading items being more significant than trailing (lexicographic ordering). Moreover, in dyadic grade ⍺⍋⍵, specifies “alphabets” to be used in comparing the items of character array .

Dyadic grade has always been an APL primitive which is hard for me to understand, in that way kind of like dyadic transpose ☺. I sat down to really understand it, starting from the simplest cases to the general case. The following is a record of my explorations.

Vector Left Argument

   gv← {⍋⍺⍳⍵}

   a0← 'abcdefghij'
   x0← 'chthonic'
      
   a0 gv x0
0 7 1 3 6 2 4 5
   a0 ⍋ x0
0 7 1 3 6 2 4 5

   x0 ⌷⍨ ⊂ a0 gv x0
cchhiton

That is, grade the indices of in . If an item of is not in then its index is ≢⍺.

Higher-Rank Left Argument with Unique Items

The coordinates of A[i;j;k;…] or A[⊂i,j,k,…] is the vector i,j,k,…. The phrase ⍳⍴A produces the array of coordinates. For example, if is the (2 26)-matrix of the upper and lower case English letters,

   ABCDEFGHIJKLMNOPQRSTUVWXYZ
   abcdefghijklmnopqrstuvwxyz

the corresponding coordinates are

   ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬   ┬────┬────┐
   │0 0│0 1│0 2│0 3│0 4│0 5│0 6│0 7│0 8│   │0 24│0 25│
   ├───┼───┼───┼───┼───┼───┼───┼───┼───┼ … ├────┼────┤
   │1 0│1 1│1 2│1 3│1 4│1 5│1 6│1 7│1 8│   │1 24│1 25│
   └───┴───┴───┴───┴───┴───┴───┴───┴───┴   ┴────┴────┘

If the items of are unique,

   gu← {⍋ 0 2 1 ⍉ (⊂(,⍺)⍳⍪⍵) ⌷ ⌽ (⍴⍺) ⍪⍨ ⍉(⍴⍺)⊤⍳×/⍴⍺}

That is, ⍺⍋⍵ obtains as the grade of the reversed coordinates of in . (If an item does not occur in , its coordinates are ⍴⍺.) The implements that in , the first axis is least significant and the last axis is most significant. For the (2 26)-matrix above, case (the first axis) is less significant than A-Z and a-z (the last axis).

   ⊢ a1←' ',⎕av[(⎕av⍳'Aa')∘.+⍳26]
 ABCDEFGHIJKLMNOPQRSTUVWXYZ
 abcdefghijklmnopqrstuvwxyz

   ⊢ x1←↑' '(≠⊆⊢)' Jay roger Roger adam Adam jay' 
Jay  
roger
Roger
adam 
Adam 
jay  

   a1 gu x1
4 3 0 5 2 1
   a1 ⍋ x1
4 3 0 5 2 1

   x1 ⌷⍨ ⊂ a1 gu x1
Adam 
adam 
Jay  
jay  
Roger
roger

Higher-Rank Left Arguments

Suppose does have duplicates? For purposes of , the coordinates of an item c are

   ⌊⌿(c=,⍺)⌿↑,⍳⍴⍺

That is, the minimum of coordinates of all items equal to c. Note that the expression also works if c is a unique item. Therefore, for a general , with or without duplicates, ⍺⍋⍵ obtains as

   gr← {⍋ 0 2 1 ⍉ (⊂(∪,⍺)⍳⍪⍵) ⌷ ⌽ (⍴⍺) ⍪⍨ (,⍺) {⌊⌿⍵}⌸ ⍉(⍴⍺)⊤⍳×/⍴⍺}

The “minimum of coordinates” computation is exploited to effect equal coodinates for disparate characters. For example, an ordering where upper and lower case are significant but diacritical marks are not, can be implemented as follows:

   A    ⍝ A has a leading blank column
 AÀÁÂÃÄÅBCÇDEÈÉÊËFGHIÌÍÎÏJKLMNÑOÒÓÔÕÖØPQRSTUÙÚÛÜVWXYÝZ
 aàáâãäåbcçdeèéêëfghiìíîïjklmnñoòóôõöøpqrstuùúûüvwxyýz
 À       Ç  È       Ì        Ñ Ò                   Ý  
 Á       ç  É       Í        ñ Ó                   ý  
 Â          Ê       Î          Ô                      
 Ã          Ë       Ï          Ö                      
 Ä          è       ì          Õ                      
 Å          é       í          Ø                      
 à          ê       î          ò                      
 á          ë       ï          ó                      
 â                             ô                      
 ã                             õ                      
 ä                             ö                      
 å                             ø                      
   ⍴A
14 54

   ('È'=,A)⌿↑,⍳⍴A                ('è'=,A)⌿↑,⍳⍴A
0 13                          1 13
2 12                          6 12
   ⌊⌿('È'=,A)⌿↑,⍳⍴A              ⌊⌿('è'=,A)⌿↑,⍳⍴A
0 12                          1 12

   ('E'=,A)⌿↑,⍳⍴A                ('e'=,A)⌿↑,⍳⍴A
0 12                          1 12
   ⌊⌿('E'=,A)⌿↑,⍳⍴A              ⌊⌿('e'=,A)⌿↑,⍳⍴A
0 12                          1 12

'È' occurs twice in A with coordinates 0 13 and 2 12. The coordinates assigned to 'È' are the minimum of these, 0 12. In contrast, 'E' occurs once and its coordinates are 0 12, the same as those for 'È'. Therefore, 'E' and 'È' are considered equal for purposes of dyadic grade. Similarly, 'e' and 'è' have coordinates 1 12 and are considered equal by , but they follow 'E' and 'È' (because their coordinates are 0 12).

For example:

   ⊢ x← ↑' '(≠⊆⊢)' roger adàm Röger rÖger Adåm JÃY JAY JÃY adåm adàm'
roger
adàm 
Röger
rÖger
Adåm 
JÃY  
JAY  
JÃY  
adåm 
adàm 

   A gr x
4 1 8 9 5 6 7 2 3 0
   A ⍋ x
4 1 8 9 5 6 7 2 3 0

   x ⌷⍨⊂ A gr x
Adåm 
adàm 
adåm 
adàm 
JÃY  
JAY  
JÃY  
Röger
rÖger
roger

Lest you think that dyadic grade in its full generality suffices to implement any ordering: in “telephone book” ordering, “1600 Pennsylvania Avenue” and “Unter den Linden 23” are ordered as if 1600 were spelled out as “Sixteen Hundred” and 23 as “Dreiundzwanzig”. A program to do that ought to be très amusant.

Code Archeology

The above code are improved versions of what appeared in Peter Wooster, Extended Upgrade and Downgrade, SHARP APL Technical Notes 35, I.P. Sharp Associates, 1980-09-15. It is interesting to study the code from the two eras. (The code from 1980 is lightly edited for executability and clarity.)

2018

gv← {⍋⍺⍳⍵}
gu← {⍋ 0 2 1 ⍉ (⊂(,⍺)⍳⍪⍵) ⌷ ⌽ (⍴⍺) ⍪⍨ ⍉(⍴⍺)⊤⍳×/⍴⍺}
gr← {⍋ 0 2 1 ⍉ (⊂(∪,⍺)⍳⍪⍵) ⌷ ⌽ (⍴⍺) ⍪⍨ (,⍺) {⌊⌿⍵}⌸ ⍉(⍴⍺)⊤⍳×/⍴⍺}

1980

eu← {d⊤⍳×/d←⍴⍵}
er← {¯1+÷(÷1+d⊤⍳×/d←⍴⍵)⌈.×a∘.=a←,⍵}

fv← {⍋⍺⍳⍵}
fu← {⍋(⍒0 1,1↓0×⍳⍴⍴⍵)⍉(⊖(eu ⍺),⍴⍺)[;(,⍺)⍳⍵]}
fr← {⍋(⍒0 1,1↓0×⍳⍴⍴⍵)⍉(⊖(er ⍺),⍴⍺)[;(,⍺)⍳⍵]}
gv, fv vector left argument
gu, fu higher-ranked left argument with unique items
gr, fr higher-ranked left argument

In the sequence gv gu gr, a function is more general than the preceding one and subsumes it. Likewise fv fu fr.

Comparison of the code illustrates advances in APL between 1980 and 2018:

  • {⌊⌿⍵}⌸ minimum of major cells corresponding to identical keys
  • ∪      unique items
  • ⍪⍵     ravel major cells
  • ⍺⍪⍵    catenate on first axis
  • ⍨      commute operator
  • dfns

Alternatives

If a left argument is large and complicated and is used repeatedly, it may be worthwhile for the APL interpreter to perform precomputations on it. Thus:

   U← ∪,A
   C← ⌽ (⍴A) ⍪⍨ (,A) {⌊⌿⍵}⌸ ⍉(⍴A)⊤⍳×/⍴A

   ⍴U        ⍴C
107       108 2

   ⍪U        C
           0  0
A          1  0
À          1  0
Á          1  0
          1  0
à         1  0
Ä          1  0
Å          1  0
B          8  0
C          9  0
Ç          9  0
…           …
x         50  1
y         51  1
ý         51  1
z         53  1
          14 54

   gp← (U C)∘{U C←⍺ ⋄ ⍋0 2 1⍉C[U⍳⍪⍵;]}

   gp x
4 1 8 9 5 6 7 2 3 0
   A ⍋ x
4 1 8 9 5 6 7 2 3 0

It makes sense that the number of columns in the coordinate matrix C is equal to the rank of the alphabet array A: The rank is the number of dimensions, a-z, upper/lower case, color, etc.; each row of C is a vector of the numeric value for each dimension.

With 20/20 hindsight, the above code can be seen as an argument against defining dyadic grade to do ordering with specified alphabets. After all,

   ⍺⍋⍵  ←→  ⍋0 2 1⍉C[U⍳⍪⍵;]

and specifying U and C directly makes the computation easier to understand, easier to use, and as it happens is faster than the primitive in the current implementation. The inverse calculation, from U C to the alphabet array A, is an amusing bit of code left as an exercise for the reader☺.

One can further argue that the current definition of dyadic grade precludes an alternative attractive but incompatible definition:

   ⍺⍋⍵  ←→  ⍺⌷⍨⊂⍋⍵

That is, order by the grade of , whence ⍋⍨ sorts. In Dyalog APL version 17.0, monadic grade is extended to work with a TAO (total array ordering). With a TAO and this alternative definition, ⍋⍨ sorts any array.

The present exposition exposes a difficulty with extending the current dyadic grade to work with TAO: It is axiomatic that monadic grade compares cells itemwise, stopping at the first pair of unequal items. Dyadic grade does not do that in general. For example, with an upper and lower case alphabet, you don’t stop comparing 'Rogerz' and 'rogers' on encountering 'R' and 'r'.

Linear Interpolation

⎕io=0 assumed throughout; works in 1-origin with the obvious modifications.

Introduction

On Wednesday, a question arrived via Dyalog Support from an intern in Africa: If M is the matrix on the left, use linear interpolation to compute the result on the right.

   1 20         1 20
   4 80         2 40
   6 82         3 60
                4 80
                5 81
                6 82

Linear Interpolation

Two points (x0,y0) and (x1,y1) specify a line; for any x there is a unique y on that line (assuming x0≠x1). The equation for the line derives as follows, starting from its slope m:

   m = (y1-y0) ÷ (x1-x0)
   (y-y0) = m × (x-x0)
   y = y0 + m × (x-x0)

Therefore, if is a 2-by-2 matrix of the two points and are the x-values to be interpolated, then:

   g ← {(⊃⌽⍺)+(⍵-⊃⍺)÷÷/-⌿⍺}

   ⊢ M←1 4 6,⍪20 80 82
1 20
4 80
6 82

   M[0 1;] g 2 3
40 60
   M[1 2;] g 5
81

A New Twist, A New Solution

The problem as posed implicitly required that:

  • The x-values are the positive integers bounded by ⊃⊖M.
  • Appropriate rows of the matrix are selected for a given x-value.
  • The missing x-values and their interpolations are “slotted back” into the argument matrix.

These requirements are best met by , interval index, a relatively new primitive function introduced in Dyalog APL version 16.0. The left argument must be sorted and partitions the universe into disjoint contiguous intervals; ⍺⍸⍵ finds the index of the interval which contains an item of . The result is ⎕io dependent.

For the given matrix M, the partition (of the real numbers in this case) is depicted below. As in conventional mathematical notation, [ denotes that the interval includes the left end-point and ) denotes that the interval excludes the right end-point.

          1        4      6
─────────)[───────)[─────)[──────────
     ¯1       0       1       2

   v←¯5 0 1 2.5 6 3 4 5 9 8 7

   1 4 6 ⍸ v
¯1 ¯1 0 0 2 0 1 1 2 2 2

   v ,[¯0.5] 1 4 6 ⍸ v
¯5  0 1 2.5 6 3 4 5 9 8 7
¯1 ¯1 0 0   2 0 1 1 2 2 2

With in hand, the problem can be solved as follows:

interpol←{
  (x y)←↓⍉⍵
  m←m,⊃⌽m←(2-/y)÷(2-/x)
  j←0⌈x⍸i←1+⍳⊃⌽x
  i,⍪y[j]+m[j]×i-x[j]
}

   interpol M
1 20
2 40
3 60
4 80
5 81
6 82

The problem of x-values less than the first end-point is finessed by applying 0⌈ to the interval indices, and that of x-values greater than or equal to the last end-point is finessed by repeating the last slope m←m,⊃⌽m.

It is possible to do the interpolation only on the missing indices (2 3 5 in this case) and insert them into the argument matrix. It seems neater to simply interpolate everything, and in so doing provide a check that the interpolated values equal the values given in the argument.

An Alternative Interpolation

Interpolating according to two selected rows of a matrix of points treats the function as piecewise linear, with sharp inflection points where the lines join (different slopes between adjacent lines). A “holistic” alternative approach is possible: the matrix can be interpreted as specifying a single line and the interpolation is according to this single line. The primitive function computes the coefficients of the line which best fits the points:

   ⎕rl←7*5  ⍝ for reproducible random numbers

   ⊢ M←t,⍪(?7⍴5)+¯17+3×t←?7⍴100
35  89
98 278
19  44
 4  ¯5
62 170
49 133
25  59

   M[;1] ⌹ 1,M[;,0]    ⍝ y-intercept and slope
¯15.3164 2.99731

   interpola ← {(1,⍤0⊢⍵)+.×⍺[;1]⌹1,⍺[;,0]}

   M[;1] ,[¯0.5] M interpola M[;0]
89      278    44      ¯5       170     133     59     
89.5895 278.42 41.6325 ¯3.32713 170.517 131.552 59.6164

   M interpola 33 35 37 39.7
83.5949 89.5895 95.5841 103.677

Finally

Our best wishes to the intern. Welcome to APL!

Stencil Lives

⎕io←0 throughout. ⎕io delenda est.

Stencil

A stencil operator is available with Dyalog version 16.0. In brief, stencil is a dyadic operator f⌺s which applies f to (possibly overlapping) rectangles. The size of the rectangle and its movement are controlled by s. For example, enclosing 3-by-3 rectangles with default movements of 1:

   ⊢ a←4 5⍴⎕a
ABCDE
FGHIJ
KLMNO
PQRST
   {⊂⍵}⌺3 3 ⊢a
┌───┬───┬───┬───┬───┐
│   │   │   │   │   │
│ AB│ABC│BCD│CDE│DE │
│ FG│FGH│GHI│HIJ│IJ │
├───┼───┼───┼───┼───┤
│ AB│ABC│BCD│CDE│DE │
│ FG│FGH│GHI│HIJ│IJ │
│ KL│KLM│LMN│MNO│NO │
├───┼───┼───┼───┼───┤
│ FG│FGH│GHI│HIJ│IJ │
│ KL│KLM│LMN│MNO│NO │
│ PQ│PQR│QRS│RST│ST │
├───┼───┼───┼───┼───┤
│ KL│KLM│LMN│MNO│NO │
│ PQ│PQR│QRS│RST│ST │
│   │   │   │   │   │
└───┴───┴───┴───┴───┘

Stencil is also known as stencil code, tile, tessellation, and cut. It has applications in artificial neural networks, computational fluid dynamics, cellular automata, etc., and of course in Conway’s Game of Life.

The Rules of Life

Each cell of a boolean matrix has 8 neighbors adjacent to it horizontally, vertically, or diagonally. The Game of Life concerns the computation of the next generation boolean matrix.

(0) A 0-cell with 3 neighboring 1-cells becomes a 1.

(1) A 1-cell with 2 or 3 neighboring 1-cells remains at 1.

(2) All other cells remain or become a 0.

There are two main variations on the treatment of cells on the edges of the matrix: (a) the matrix is surrounded by a border of 0s; or (b) cells on an edge are adjacent to cells on the opposite edge, as on a torus.

There is an implementation of life in the dfns workspace and explained in a YouTube video. It assumes a toroidal topology.

   life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}    ⍝ John Scholes

   ⊢ glider←5 5⍴0 0 1 0 0 1 0 1 0 0 0 1 1,12⍴0
0 0 1 0 0
1 0 1 0 0
0 1 1 0 0
0 0 0 0 0
0 0 0 0 0
   life glider
0 1 0 0 0
0 0 1 1 0
0 1 1 0 0
0 0 0 0 0
0 0 0 0 0

   {'.⍟'[⍵]}¨ (⍳8) {life⍣⍺⊢⍵}¨ ⊂glider
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│..⍟..│.⍟...│..⍟..│.....│.....│.....│.....│.....│
│⍟.⍟..│..⍟⍟.│...⍟.│.⍟.⍟.│...⍟.│..⍟..│...⍟.│.....│
│.⍟⍟..│.⍟⍟..│.⍟⍟⍟.│..⍟⍟.│.⍟.⍟.│...⍟⍟│....⍟│..⍟.⍟│
│.....│.....│.....│..⍟..│..⍟⍟.│..⍟⍟.│..⍟⍟⍟│...⍟⍟│
│.....│.....│.....│.....│.....│.....│.....│...⍟.│
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘

Stencil Lives

It has long been known that stencil facilitates Game of Life computations. Eugene McDonnell explored the question in the APL88 paper Life: Nasty, Brutish, and Short [Ho51]. The shortest of the solutions derive as follows.

By hook or by crook, find all the 3-by-3 boolean matrices U which lead to a middle 1. A succinct Game of Life then obtains.

   B ← {1 1⌷life 3 3⍴(9⍴2)⊤⍵}¨ ⍳2*9
   U ← {3 3⍴(9⍴2)⊤⍵}¨ ⍸B  ⍝ ⍸ ←→ {⍵/⍳⍴⍵}

   life1 ← {U ∊⍨ {⊂⍵}⌺3 3⊢⍵}    ⍝ Eugene McDonnell

Comparing life and life1, and also illustrating that the toroidal and 0-border computations can be expressed one with the other.

   b←1=?97 103⍴3

   x←1 1↓¯1 ¯1↓ life 0,0,⍨0⍪0⍪⍨b
   y←life1 b
   x≡y
1

   g←{(¯1↑⍵)⍪⍵⍪1↑⍵}
   p←life b
   q←1 1↓¯1 ¯1↓ life1 (g b[;102]),(g b),(g b[;0])
   p≡q
1

Adám Brudzewsky points out that life can be terser as a train (fork):

   life1a ← U ∊⍨ ⊢∘⊂⌺3 3    ⍝ Adám Brudzewsky
   life1b ← U ∊⍨ {⊂⍵}⌺3 3

   (life1 ≡ life1a) b
1
   (life1 ≡ life1b) b
1

life1 is an example of implementing a calculation by look-up rather than by a more conventional computation, discussed in a recent blog post. There is a variation which is more efficient because the look-up is effected with integers rather than boolean matrices:

   A←3 3⍴2*⌽⍳9
   life2 ← {B[{+/,A×⍵}⌺3 3⊢⍵]}

   (life1 ≡ life1b) b
1

Jay Foad offers another stencil life, translating an algorithm in k by Arthur Whitney:

   life3 ← {3=s-⍵∧4=s←{+/,⍵}⌺3 3⊢⍵}    ⍝ Jay Foad

   (life1 ≡ life3) b
1

The algorithm combines the life rules into a single expression, wherein s←{+/,⍵}⌺3 3 ⊢⍵

(0) for 0-cells s is the number of neighbors; and
(1) for 1-cells s is the number of neighbors plus 1, and the plus 1 only matters if s is 4.

The same idea can be retrofit into the toroidal life:

   lifea←{3=s-⍵∧4=s←⊃+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}

   (life ≡ lifea) b
1

Collected Definitions and Timings

life   ← {↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}
lifea  ← {3=s-⍵∧4=s←⊃+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}

  B←{1 1⌷life 3 3⍴(9⍴2)⊤⍵}¨ ⍳2*9
  U←{3 3⍴(9⍴2)⊤⍵}¨ ⍸ B
  A←3 3⍴2*⌽⍳9

life1  ← {U ∊⍨ {⊂⍵}⌺3 3⊢⍵}
life1a ← U ∊⍨ ⊢∘⊂⌺3 3
life1b ← U ∊⍨ {⊂⍵}⌺3 3
life2  ← {B[{+/,A×⍵}⌺3 3⊢⍵]}
life3  ← {3=s-⍵∧4=s←{+/,⍵}⌺3 3⊢⍵}

   cmpx (⊂'life') ,¨ '1' '1a' '1b' '2' '3' '' 'a' ,¨ ⊂' b'
  life1 b  → 2.98E¯3 |    0% ⎕⎕⎕⎕⎕
  life1a b → 1.97E¯2 | +561% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  life1b b → 2.99E¯3 |    0% ⎕⎕⎕⎕⎕
  life2 b  → 2.71E¯4 |  -91%
  life3 b  → 6.05E¯5 |  -98%
* life b   → 1.50E¯4 |  -95%
* lifea b  → 1.41E¯4 |  -96%

The * indicate that life and lifea give a different result (toroidal v 0-border).

life1a is much slower than the others because ⊢∘⊂⌺ is not implemented by special code.

{+/,⍵}⌺ is the fastest of the special codes because the computation has mathematical properties absent from {⊂⍵}⌺ and {+/,A×⍵}⌺.

The effect of special code v not, can be observed (for example) by use of redundant parentheses:

   cmpx '{+/,⍵}⌺3 3⊢b' '{+/,(⍵)}⌺3 3⊢b'
  {+/,⍵}⌺3 3⊢b   → 3.13E¯5  |      0%
  {+/,(⍵)}⌺3 3⊢b → 2.63E¯2  | +83900% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

   cmpx '{+/,A×⍵}⌺3 3⊢b' '{+/,A×(⍵)}⌺3 3⊢b'
  {+/,A×⍵}⌺3 3⊢b   → 2.42E¯4 |      0%
  {+/,A×(⍵)}⌺3 3⊢b → 2.98E¯2 | +12216% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

   ⍝ no special code in either of the following expressions
   cmpx '{+/,2×⍵}⌺3 3⊢b' '{+/,2×(⍵)}⌺3 3⊢b'
  {+/,2×⍵}⌺3 3⊢b   → 2.92E¯2 |      0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
  {+/,2×(⍵)}⌺3 3⊢b → 3.03E¯2 |     +3% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

That life and lifea do not use stencil and yet are competitive, illustrates the efficacy of boolean operations and of letting primitives “see” large arguments.

Finally, if you wish to play with stencil a description and a hi-fi model of it can be found here.

Stencil Maneuvers

Introduction

The e-mail arrived in the early afternoon from Morten, in Finland attending the FinnAPL Forest Seminar.

How do I speed this up and impress the Finns?

0 cmpx 'e←⊃∨/0.2 edges¨r g b'
6.4E¯1
edges
{⍺←0.7 ⋄ 1 1↓¯1 ¯1↓⍺<(|EdgeDetect apply ⍵)÷1⌈(+⌿÷≢),⍵}
apply
{stencil←⍺ ⋄ {+/,⍵×stencil}⌺(⍴stencil)⊢⍵}
EdgeDetect
¯1 ¯1 ¯1
¯1 8 ¯1
¯1 ¯1 ¯1

(r g b in the attached ws)

Background

is stencil, a new dyadic operator which will be available in version 16.0. In brief, f⌺s applies the operand function f to windows of size s. The window is moved over the argument centered over every possible position. The size is commonly odd and the movement commonly 1. For example:

   {⊂⍵}⌺5 3⊢6 5⍴⍳30
┌───────┬────────┬────────┬────────┬───────┐
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
│0  1  2│ 1  2  3│ 2  3  4│ 3  4  5│ 4  5 0│
│0  6  7│ 6  7  8│ 7  8  9│ 8  9 10│ 9 10 0│
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
├───────┼────────┼────────┼────────┼───────┤
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
│0  1  2│ 1  2  3│ 2  3  4│ 3  4  5│ 4  5 0│
│0  6  7│ 6  7  8│ 7  8  9│ 8  9 10│ 9 10 0│
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
├───────┼────────┼────────┼────────┼───────┤
│0  1  2│ 1  2  3│ 2  3  4│ 3  4  5│ 4  5 0│
│0  6  7│ 6  7  8│ 7  8  9│ 8  9 10│ 9 10 0│
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
│0 21 22│21 22 23│22 23 24│23 24 25│24 25 0│
├───────┼────────┼────────┼────────┼───────┤
│0  6  7│ 6  7  8│ 7  8  9│ 8  9 10│ 9 10 0│
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
│0 21 22│21 22 23│22 23 24│23 24 25│24 25 0│
│0 26 27│26 27 28│27 28 29│28 29 30│29 30 0│
├───────┼────────┼────────┼────────┼───────┤
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
│0 21 22│21 22 23│22 23 24│23 24 25│24 25 0│
│0 26 27│26 27 28│27 28 29│28 29 30│29 30 0│
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
├───────┼────────┼────────┼────────┼───────┤
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
│0 21 22│21 22 23│22 23 24│23 24 25│24 25 0│
│0 26 27│26 27 28│27 28 29│28 29 30│29 30 0│
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
└───────┴────────┴────────┴────────┴───────┘
   {+/,⍵}⌺5 3⊢6 5⍴⍳30
 39  63  72  81  57
 72 114 126 138  96
115 180 195 210 145
165 255 270 285 195
152 234 246 258 176
129 198 207 216 147

In addition, for matrix right arguments with movement 1, special code is provided for the following operand functions:

{∧/,⍵}   {∨/,⍵}   {=/,⍵}   {≠/,⍵} 

{⍵}      {,⍵}     {⊂⍵}     {+/,⍵}

{+/,A×⍵}     {E<+/,A×⍵}       
{+/⍪A×⍤2⊢⍵}  {E<+/⍪A×⍤2⊢⍵}

The comparison < can be one of < ≤ = ≠ ≥ >.

The variables r g b in the problem are integer matrices with shape 227 316 having values between 0 and 255. For present purposes we can initialize them to random values:

   r←¯1+?227 316⍴256
   g←¯1+?227 316⍴256
   b←¯1+?227 316⍴256

Opening Moves

I had other obligations and did not attend to the problem until 7 PM. A low-hanging fruit was immediately apparent: {+/,⍵×A}⌺s is not implemented by special code but {+/,A×⍵}⌺s is. The difference in performance is significant:

   edges1←{⍺←0.7 ⋄ 1 1↓¯1 ¯1↓⍺<(|EdgeDetect apply1 ⍵)÷1⌈(+⌿÷≢),⍵}
   apply1←{stencil←⍺ ⋄ {+/,stencil×⍵}⌺(⍴stencil)⊢⍵}

   cmpx 'e←⊃∨/0.2 edges1¨r g b' 'e←⊃∨/0.2 edges ¨r g b'
  e←⊃∨/0.2 edges1¨r g b → 8.0E¯3 |     0% ⎕
  e←⊃∨/0.2 edges ¨r g b → 6.1E¯1 | +7559% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

I fired off an e-mail reporting this good result, then turned to other more urgent matters. An example of the good being an enemy of the better, I suppose.

When I returned to the problem at 11 PM, the smug good feelings have largely dissipated:

  • Why should {+/,A×⍵}⌺s be so much faster than {+/,⍵×A}⌺s? That is, why is there special code for the former but not the latter? (Answer: all the special codes end with … ⍵}.)
  • I can not win: If the factor is small, then why isn’t it larger; if it is large, it’s only because the original code wasn’t very good.
  • The large factor is because, for this problem, C (special code) is still so much faster than APL (magic function).
  • If the absolute value can somehow be replaced, the operand function can then be in the form {E<+/,A×⍵}⌺s, already implemented by special code. Alternatively, {E<|+/,A×⍵}⌺s can be implemented by new special code.

Regarding the last point, the performance improvement potentially can be:

   edges2←{⍺←0.7 ⋄ 1 1↓¯1 ¯1↓(⍺ EdgeDetect)apply2 ⍵}
   apply2←{threshold stencil←⍺ ⋄ 
                    {threshold<+/,stencil×⍵}⌺(⍴stencil)⊢⍵}

   cmpx (⊂'e←⊃∨/0.2 edges'),¨'21 ',¨⊂'¨r g b'
  e←⊃∨/0.2 edges2¨r g b → 4.8E¯3 |      0%
* e←⊃∨/0.2 edges1¨r g b → 7.5E¯3 |    +57%
* e←⊃∨/0.2 edges ¨r g b → 7.4E¯1 | +15489% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The * in the result of cmpx indicates that the expressions don’t give the same results. That is expected; edges2 is not meant to give the same results but is to get a sense of the performance difference. (Here, an additional factor of 1.57.)

edges1 has the expression ⍺<matrix÷1⌈(+⌿÷≢),⍵ where and the divisor are both scalars. Reordering the expression eliminates one matrix operation: (⍺×1⌈(+⌿÷≢),⍵)<matrix. Thus:

   edges1a←{⍺←0.7 ⋄ 1 1↓¯1 ¯1↓(⍺×1⌈(+⌿÷≢),⍵)<|EdgeDetect apply1 ⍵}

   cmpx (⊂'e←⊃∨/0.2 edges'),¨('1a' '1 ' '  '),¨⊂'¨r g b'
  e←⊃∨/0.2 edges1a¨r g b → 6.3E¯3 |      0%
  e←⊃∨/0.2 edges1 ¨r g b → 7.6E¯3 |    +22%
  e←⊃∨/0.2 edges  ¨r g b → 6.5E¯1 | +10232% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The time was almost 1 AM. To bed.

Further Maneuvers

Fresh and promising ideas came with the morning. The following discussion applies to the operand function {+/,A×⍵}.

(0) Scalar multiple: If all the elements of A are equal, then {+/,A×⍵}⌺(⍴A)⊢r ←→ (⊃A)×{+/,⍵}⌺(⍴A)⊢r.

   A←3 3⍴?17
   ({+/,A×⍵}⌺(⍴A)⊢r) ≡ (⊃A)×{+/,⍵}⌺(⍴A)⊢r
1

(1) Sum v inner product: {+/,⍵}⌺(⍴A)⊢r is significantly faster than {+/,A×⍵}⌺(⍴A)⊢r because the former exploits mathematical properties absent from the latter.

   A←?3 3⍴17
   cmpx '{+/,⍵}⌺(⍴A)⊢r' '{+/,A×⍵}⌺(⍴A)⊢r'
  {+/,⍵}⌺(⍴A)⊢r   → 1.4E¯4 |    0% ⎕⎕⎕
* {+/,A×⍵}⌺(⍴A)⊢r → 1.3E¯3 | +828% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

(The * in the result of cmpx is expected.)

(2) Linearity: the stencil of the sum equals the sum of the stencils.

   A←?3 3⍴17
   B←?3 3⍴17
   ({+/,(A+B)×⍵}⌺(⍴A)⊢r) ≡ ({+/,A×⍵}⌺(⍴A)⊢r) + {+/,B×⍵}⌺(⍴A)⊢r 
1

(3) Middle: If B is zero everywhere except the middle, then {+/,B×⍵}⌺(⍴B)⊢r ←→ mid×r where mid is the middle value.

   B←(⍴A)⍴0 0 0 0 9
   B
0 0 0
0 9 0
0 0 0
   ({+/,B×⍵}⌺(⍴B)⊢r) ≡ 9×r
1

(4) A faster solution.

   A←EdgeDetect
   B←(⍴A)⍴0 0 0 0 9
   C←(⍴A)⍴¯1
   A B C
┌────────┬─────┬────────┐
│¯1 ¯1 ¯1│0 0 0│¯1 ¯1 ¯1│
│¯1  8 ¯1│0 9 0│¯1 ¯1 ¯1│
│¯1 ¯1 ¯1│0 0 0│¯1 ¯1 ¯1│
└────────┴─────┴────────┘
   A ≡ B+C
1

Whence:

   ({+/,A×⍵}⌺(⍴A)⊢r) ≡ ({+/,B×⍵}⌺(⍴A)⊢r) + {+/,C×⍵}⌺(⍴A)⊢r ⍝ (2)
1
   ({+/,A×⍵}⌺(⍴A)⊢r) ≡ ({+/,B×⍵}⌺(⍴A)⊢r) - {+/,⍵}⌺(⍴A)⊢r   ⍝ (0)
1
   ({+/,A×⍵}⌺(⍴A)⊢r) ≡ (9×r) - {+/,⍵}⌺(⍴A)⊢r                ⍝ (3)
1

Putting it all together:

edges3←{
  ⍺←0.7
  mid←⊃EdgeDetect↓⍨⌊2÷⍨⍴EdgeDetect
  1 1↓¯1 ¯1↓(⍺×1⌈(+⌿÷≢),⍵)<|(⍵×1+mid)-{+/,⍵}⌺(⍴EdgeDetect)⊢⍵
}

Comparing the various edges:

   x←(⊂'e←⊃∨/0.2 edges'),¨('3 ' '2 ' '1a' '1 ' '  '),¨⊂'¨r g b'
   cmpx x
  e←⊃∨/0.2 edges3 ¨r g b → 3.4E¯3 |      0%
* e←⊃∨/0.2 edges2 ¨r g b → 4.3E¯3 |    +25%
  e←⊃∨/0.2 edges1a¨r g b → 6.4E¯3 |    +88%
  e←⊃∨/0.2 edges1 ¨r g b → 7.5E¯3 |   +122%
  e←⊃∨/0.2 edges  ¨r g b → 6.5E¯1 | +19022% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
edges original
edges1 uses {+/,stencil×⍵} instead of {+/,⍵×stencil}
edges1a uses (⍺×mean,⍵)<matrix instead of ⍺<matrix÷mean,⍵
edges2 uses {threshold<+/,stencil×⍵}
edges3 uses the faster {+/,⍵} instead of {+/,stencil×⍵} and other maneuvers

Fin

I don’t know if the Finns would be impressed. The exercise has been amusing in any case.

Calculation v Look-Up

(⎕io←0 and timings are done in Dyalog version 15.0.)

Table Look-Up

Some functions can be computed faster by table look-up than a more traditional and more conventional calculation. For example:

      b←?1e6⍴2

      cmpx '*b' '(*0 1)[b]'
 *b        → 1.32E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 (*0 1)[b] → 1.23E¯3 | -91% ⎕⎕⎕

Some observations about this benchmark:

(a) The advantage of table look-up depends on the time to calculate the function directly, versus the time to do indexing, ⍺[⍵] or ⍵⌷⍺, plus a small amount of time to calculate the function on a (much) smaller representative set.

Indexing is particularly efficient when the indices are boolean:

      bb←b,1
      bi←b,2
      cmpx '2 3[bb]' '2 3 3[bi]'
 2 3[bb]   → 1.12E¯4 |    0% ⎕⎕⎕⎕⎕
 2 3 3[bi] → 7.39E¯4 | +558% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

bi is the same as bb except that the last element of 2 forces it to be 1-byte integers (for bi) instead of 1-bit booleans (for bb).

(b) Although the faster expression works best with 0-origin, the timing for 1-origin is similar.

      cmpx '*b' '{⎕io←0 ⋄ (*0 1)[⍵]}b'
 *b                   → 1.32E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 {⎕io←0 ⋄ (*0 1)[⍵]}b → 1.22E¯3 | -91% ⎕⎕⎕

That is, if the current value of ⎕io is 1, the implementation can use a local ⎕io setting of 0.

(c) The fact that *b is currently slower than (*0 1)[b] represents a judgment by the implementers that *b is not a sufficiently common computation to warrant writing faster code for it. Other similar functions already embody the table look-up technique:

      cmpx '⍕b' '¯1↓,(2 2⍴''0 1 '')[b;]'
 ⍕b                   → 6.95E¯4 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 ¯1↓,(2 2⍴'0 1 ')[b;] → 6.92E¯4 | -1% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Small Domains

Table look-up works best for booleans, but it also works for other small domains such as 1-byte integers:

      i1←?1e6⍴128    ⍝ 1-byte integers

      cmpx '*i1' '(*⍳128)[i1]'
 *i1         → 1.48E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 (*⍳128)[i1] → 9.30E¯4 | -94% ⎕⎕

      cmpx '○i1' '(○⍳128)[i1]'
 ○i1         → 2.78E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 (○⍳128)[i1] → 9.25E¯4 | -67% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

In general, a table for 1-byte integers needs to have values for ¯128+⍳256. Here ⍳128 is used to simulate that in C indexing can be done on 1-byte indices without extra computation on the indices. In APL, general 1-byte indices are used as table[i1+128]; in C, the expression is (table+offset)[i].

Domains considered “small” get bigger all the time as machines come equipped with ever larger memories.

Larger Domains

Table look-up is applicable when the argument is not a substantial subset of a large domain and there is a fast way to detect that it is not.

      i4s←1e7+?1e6⍴1e5    ⍝ small-range 4-byte integers

      G←{(⊂u⍳⍵)⌷⍺⍺ u←∪⍵}

      cmpx '⍟i4s' '⍟G i4s'
 ⍟i4s   → 1.44E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
 ⍟G i4s → 9.59E¯3 | -34% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The definition of G implements that the argument is not used directly as indices (as in (⊂⍵)⌷⍺⍺ u) but needed to be looked up, the u⍳⍵ in (⊂u⍳⍵)⌷⍺⍺ u. Thus both index-of and indexing are key enablers for making table look-up competitive.

§4.3 of Notation as a Tool of Thought presents a different way to distribute the results of a calculation on the “nub” (unique items). But it takes more time and space and is limited to numeric scalar results.

      H←{(⍺⍺ u)+.×(u←∪⍵)∘.=⍵}

      i4a←1e7+?1e4⍴1e3    ⍝ small-range 4-byte integers

      cmpx '⍟i4a' '⍟G i4a' '⍟H i4a'
 ⍟i4a   → 1.56E¯4 |       0%
 ⍟G i4a → 6.25E¯5 |     -60%
 ⍟H i4a → 1.78E¯1 | +113500% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The benchmark is done on the smaller i4a as a benchmark on i4s founders on the ≢∪⍵ by ≢⍵ boolean matrix (for i4s, 12.5 GB = ×/ 0.125 1e5 1e6) created by H.

Mathematical Background

New Math” teaches that a function is a set of ordered pairs whose first items are unique:

*⍵: {(⍵,*⍵)|⍵∊C}      The set of all (⍵,*⍵) where is a complex number
⍟⍵: {(⍵,⍟⍵)|⍵∊C~{0}}  The set of all (⍵,⍟⍵) where is a non-zero complex number
○⍵: {(⍵,○⍵)|⍵∊C}      The set of all (⍵,○⍵) where is a complex number
⍕⍵: {(⍵,⍕⍵)|0=⍴⍴⍵}    The set of all (⍵,⍕⍵) where is a scalar

When is restricted (for example) to the boolean domain, the functions, the sets of ordered pairs, are more simply represented by enumerating all the possibilities:

*⍵: {(0,1), (1,2.71828)}
⍟⍵: {(0,Err), (1,0)}
○⍵: {(0,0), (1,3.14159)}
⍕⍵: {(0,'0'), (1,'1')}

Realized and Potential Performance Improvements

realized (available no later than 15.0)

boolean
1-byte integer
⍕ ∘.f
⍕     a∘⍕

potential (non-exhaustive list)

boolean
1-byte integer
2-byte integer
small-range 4-byte integer
* ⍟ | - ○   ! ⍳ a∘f f.g
* ⍟ | - ○ × ! ⍳ a∘f f.g ∘.f
* ⍟ | - ○ ×   ⍳ a∘f
* ⍟ |     ×

a is a scalar; f and g are primitive scalar dyadic functions.