Stencil Maneuvers

Introduction

The e-mail arrived in the early afternoon from Morten, in Finland attending the FinnAPL Forest Seminar.

How do I speed this up and impress the Finns?

0 cmpx 'e←⊃∨/0.2 edges¨r g b'
6.4E¯1
edges
{⍺←0.7 ⋄ 1 1↓¯1 ¯1↓⍺<(|EdgeDetect apply ⍵)÷1⌈(+⌿÷≢),⍵}
apply
{stencil←⍺ ⋄ {+/,⍵×stencil}⌺(⍴stencil)⊢⍵}
EdgeDetect
¯1 ¯1 ¯1
¯1 8 ¯1
¯1 ¯1 ¯1

(r g b in the attached ws)

Background

is stencil, a new dyadic operator which will be available in version 16.0. In brief, f⌺s applies the operand function f to windows of size s. The window is moved over the argument centered over every possible position. The size is commonly odd and the movement commonly 1. For example:

   {⊂⍵}⌺5 3⊢6 5⍴⍳30
┌───────┬────────┬────────┬────────┬───────┐
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
│0  1  2│ 1  2  3│ 2  3  4│ 3  4  5│ 4  5 0│
│0  6  7│ 6  7  8│ 7  8  9│ 8  9 10│ 9 10 0│
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
├───────┼────────┼────────┼────────┼───────┤
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
│0  1  2│ 1  2  3│ 2  3  4│ 3  4  5│ 4  5 0│
│0  6  7│ 6  7  8│ 7  8  9│ 8  9 10│ 9 10 0│
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
├───────┼────────┼────────┼────────┼───────┤
│0  1  2│ 1  2  3│ 2  3  4│ 3  4  5│ 4  5 0│
│0  6  7│ 6  7  8│ 7  8  9│ 8  9 10│ 9 10 0│
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
│0 21 22│21 22 23│22 23 24│23 24 25│24 25 0│
├───────┼────────┼────────┼────────┼───────┤
│0  6  7│ 6  7  8│ 7  8  9│ 8  9 10│ 9 10 0│
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
│0 21 22│21 22 23│22 23 24│23 24 25│24 25 0│
│0 26 27│26 27 28│27 28 29│28 29 30│29 30 0│
├───────┼────────┼────────┼────────┼───────┤
│0 11 12│11 12 13│12 13 14│13 14 15│14 15 0│
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
│0 21 22│21 22 23│22 23 24│23 24 25│24 25 0│
│0 26 27│26 27 28│27 28 29│28 29 30│29 30 0│
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
├───────┼────────┼────────┼────────┼───────┤
│0 16 17│16 17 18│17 18 19│18 19 20│19 20 0│
│0 21 22│21 22 23│22 23 24│23 24 25│24 25 0│
│0 26 27│26 27 28│27 28 29│28 29 30│29 30 0│
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
│0  0  0│ 0  0  0│ 0  0  0│ 0  0  0│ 0  0 0│
└───────┴────────┴────────┴────────┴───────┘
   {+/,⍵}⌺5 3⊢6 5⍴⍳30
 39  63  72  81  57
 72 114 126 138  96
115 180 195 210 145
165 255 270 285 195
152 234 246 258 176
129 198 207 216 147

In addition, for matrix right arguments with movement 1, special code is provided for the following operand functions:

{∧/,⍵}   {∨/,⍵}   {=/,⍵}   {≠/,⍵} 

{⍵}      {,⍵}     {⊂⍵}     {+/,⍵}

{+/,A×⍵}     {E<+/,A×⍵}       
{+/⍪A×⍤2⊢⍵}  {E<+/⍪A×⍤2⊢⍵}

The comparison < can be one of < ≤ = ≠ ≥ >.

The variables r g b in the problem are integer matrices with shape 227 316 having values between 0 and 255. For present purposes we can initialize them to random values:

   r←¯1+?227 316⍴256
   g←¯1+?227 316⍴256
   b←¯1+?227 316⍴256

Opening Moves

I had other obligations and did not attend to the problem until 7 PM. A low-hanging fruit was immediately apparent: {+/,⍵×A}⌺s is not implemented by special code but {+/,A×⍵}⌺s is. The difference in performance is significant:

   edges1←{⍺←0.7 ⋄ 1 1↓¯1 ¯1↓⍺<(|EdgeDetect apply1 ⍵)÷1⌈(+⌿÷≢),⍵}
   apply1←{stencil←⍺ ⋄ {+/,stencil×⍵}⌺(⍴stencil)⊢⍵}

   cmpx 'e←⊃∨/0.2 edges1¨r g b' 'e←⊃∨/0.2 edges ¨r g b'
  e←⊃∨/0.2 edges1¨r g b → 8.0E¯3 |     0% ⎕
  e←⊃∨/0.2 edges ¨r g b → 6.1E¯1 | +7559% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

I fired off an e-mail reporting this good result, then turned to other more urgent matters. An example of the good being an enemy of the better, I suppose.

When I returned to the problem at 11 PM, the smug good feelings have largely dissipated:

  • Why should {+/,A×⍵}⌺s be so much faster than {+/,⍵×A}⌺s? That is, why is there special code for the former but not the latter? (Answer: all the special codes end with … ⍵}.)
  • I can not win: If the factor is small, then why isn’t it larger; if it is large, it’s only because the original code wasn’t very good.
  • The large factor is because, for this problem, C (special code) is still so much faster than APL (magic function).
  • If the absolute value can somehow be replaced, the operand function can then be in the form {E<+/,A×⍵}⌺s, already implemented by special code. Alternatively, {E<|+/,A×⍵}⌺s can be implemented by new special code.

Regarding the last point, the performance improvement potentially can be:

   edges2←{⍺←0.7 ⋄ 1 1↓¯1 ¯1↓(⍺ EdgeDetect)apply2 ⍵}
   apply2←{threshold stencil←⍺ ⋄ 
                    {threshold<+/,stencil×⍵}⌺(⍴stencil)⊢⍵}

   cmpx (⊂'e←⊃∨/0.2 edges'),¨'21 ',¨⊂'¨r g b'
  e←⊃∨/0.2 edges2¨r g b → 4.8E¯3 |      0%
* e←⊃∨/0.2 edges1¨r g b → 7.5E¯3 |    +57%
* e←⊃∨/0.2 edges ¨r g b → 7.4E¯1 | +15489% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The * in the result of cmpx indicates that the expressions don’t give the same results. That is expected; edges2 is not meant to give the same results but is to get a sense of the performance difference. (Here, an additional factor of 1.57.)

edges1 has the expression ⍺<matrix÷1⌈(+⌿÷≢),⍵ where and the divisor are both scalars. Reordering the expression eliminates one matrix operation: (⍺×1⌈(+⌿÷≢),⍵)<matrix. Thus:

   edges1a←{⍺←0.7 ⋄ 1 1↓¯1 ¯1↓(⍺×1⌈(+⌿÷≢),⍵)<|EdgeDetect apply1 ⍵}

   cmpx (⊂'e←⊃∨/0.2 edges'),¨('1a' '1 ' '  '),¨⊂'¨r g b'
  e←⊃∨/0.2 edges1a¨r g b → 6.3E¯3 |      0%
  e←⊃∨/0.2 edges1 ¨r g b → 7.6E¯3 |    +22%
  e←⊃∨/0.2 edges  ¨r g b → 6.5E¯1 | +10232% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The time was almost 1 AM. To bed.

Further Maneuvers

Fresh and promising ideas came with the morning. The following discussion applies to the operand function {+/,A×⍵}.

(0) Scalar multiple: If all the elements of A are equal, then {+/,A×⍵}⌺(⍴A)⊢r ←→ (⊃A)×{+/,⍵}⌺(⍴A)⊢r.

   A←3 3⍴?17
   ({+/,A×⍵}⌺(⍴A)⊢r) ≡ (⊃A)×{+/,⍵}⌺(⍴A)⊢r
1

(1) Sum v inner product: {+/,⍵}⌺(⍴A)⊢r is significantly faster than {+/,A×⍵}⌺(⍴A)⊢r because the former exploits mathematical properties absent from the latter.

   A←?3 3⍴17
   cmpx '{+/,⍵}⌺(⍴A)⊢r' '{+/,A×⍵}⌺(⍴A)⊢r'
  {+/,⍵}⌺(⍴A)⊢r   → 1.4E¯4 |    0% ⎕⎕⎕
* {+/,A×⍵}⌺(⍴A)⊢r → 1.3E¯3 | +828% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

(The * in the result of cmpx is expected.)

(2) Linearity: the stencil of the sum equals the sum of the stencils.

   A←?3 3⍴17
   B←?3 3⍴17
   ({+/,(A+B)×⍵}⌺(⍴A)⊢r) ≡ ({+/,A×⍵}⌺(⍴A)⊢r) + {+/,B×⍵}⌺(⍴A)⊢r 
1

(3) Middle: If B is zero everywhere except the middle, then {+/,B×⍵}⌺(⍴B)⊢r ←→ mid×r where mid is the middle value.

   B←(⍴A)⍴0 0 0 0 9
   B
0 0 0
0 9 0
0 0 0
   ({+/,B×⍵}⌺(⍴B)⊢r) ≡ 9×r
1

(4) A faster solution.

   A←EdgeDetect
   B←(⍴A)⍴0 0 0 0 9
   C←(⍴A)⍴¯1
   A B C
┌────────┬─────┬────────┐
│¯1 ¯1 ¯1│0 0 0│¯1 ¯1 ¯1│
│¯1  8 ¯1│0 9 0│¯1 ¯1 ¯1│
│¯1 ¯1 ¯1│0 0 0│¯1 ¯1 ¯1│
└────────┴─────┴────────┘
   A ≡ B+C
1

Whence:

   ({+/,A×⍵}⌺(⍴A)⊢r) ≡ ({+/,B×⍵}⌺(⍴A)⊢r) + {+/,C×⍵}⌺(⍴A)⊢r ⍝ (2)
1
   ({+/,A×⍵}⌺(⍴A)⊢r) ≡ ({+/,B×⍵}⌺(⍴A)⊢r) - {+/,⍵}⌺(⍴A)⊢r   ⍝ (0)
1
   ({+/,A×⍵}⌺(⍴A)⊢r) ≡ (9×r) - {+/,⍵}⌺(⍴A)⊢r                ⍝ (3)
1

Putting it all together:

edges3←{
  ⍺←0.7
  mid←⊃EdgeDetect↓⍨⌊2÷⍨⍴EdgeDetect
  1 1↓¯1 ¯1↓(⍺×1⌈(+⌿÷≢),⍵)<|(⍵×1+mid)-{+/,⍵}⌺(⍴EdgeDetect)⊢⍵
}

Comparing the various edges:

   x←(⊂'e←⊃∨/0.2 edges'),¨('3 ' '2 ' '1a' '1 ' '  '),¨⊂'¨r g b'
   cmpx x
  e←⊃∨/0.2 edges3 ¨r g b → 3.4E¯3 |      0%
* e←⊃∨/0.2 edges2 ¨r g b → 4.3E¯3 |    +25%
  e←⊃∨/0.2 edges1a¨r g b → 6.4E¯3 |    +88%
  e←⊃∨/0.2 edges1 ¨r g b → 7.5E¯3 |   +122%
  e←⊃∨/0.2 edges  ¨r g b → 6.5E¯1 | +19022% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
edges original
edges1 uses {+/,stencil×⍵} instead of {+/,⍵×stencil}
edges1a uses (⍺×mean,⍵)<matrix instead of ⍺<matrix÷mean,⍵
edges2 uses {threshold<+/,stencil×⍵}
edges3 uses the faster {+/,⍵} instead of {+/,stencil×⍵} and other maneuvers

Fin

I don’t know if the Finns would be impressed. The exercise has been amusing in any case.