Fast APL

An overview of topics related to writing performant code and optimising existing code.

Audience

APLers

Goals for code

In APL, the ability to express similar ideas (or even the exact same idea) in multiple ways is quite pronounced.

This double-edged sword of language is both one of the most enjoyable parts of writing (choosing an expression which suits oneself), but it is also a source of frustration ("How can I express that better?" "What is a better way to put that?" "What is the best way to express this idea?").

"Better code"

  • Aaron Hsu: How much (money) are you willing to bet on this code?
  • Roger Hui: Monument quality code

  • Accurate
  • Reliable

Variables

Reference: Dyalog Webinars: APL CodeGolf Autumn Tournament

  • Accurate
  • Reliable

  • Readable: Can a stranger understand it?
  • Fast: Does it perform in reasonable time using reasonable resources?
  • Short: APLers need not be convinced
  • Balanced

Here we advocate for balanced code, as this is desirable in production.

Fast APL

  • Analysis and profiling
  • Mitigating hotspots through
    • Implementing mechanical sympathy
    • Using special cased code (The Interpretive Advantage)
    • Compiling chunks
    • Outsourcing jobs
  • Algorithms and primitive complexity

Analysis and profiling

Rule ⎕IO

Do not optimise code which has not been measured as slow in realistic situations.

Optimised code is often longer and much less readable.

dfns.life

In [2]:
R¯2¯15 7(3 3⍴⍳9)2 3 4 5 8
lifeNested{1 .3 4=+/,¯1 0 1∘.¯1 0 1∘.⊖⊂}
lifeFlat{(3=c)4=c+,[1 2]¯1 0 10 9999 2¯1 0 10 99}
In [3]:
]runtime -c "lifeFlat R" "lifeNested R"
Out[3]:
lifeFlat R → 1.1E¯5 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ lifeNested R → 1.4E¯5 | +27% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Code analysis tools

⎕PROFILE   ]Profile
dfns.cmpx  ]Runtime

$${(\sum_{n=1}^{N}A_n})\div{N}$$

$$\sum_{n=1}^{N}({A_n}\div{N})$$

$${(\sum_{n=1}^{N}A_n})\div{N}$$

In [4]:
]dinput
avg1{
  N    ⍝ Count elements
  s+   ⍝ Sum elements 
  s÷N     ⍝ Sum divided by count
}

$$\sum_{n=1}^{N}({A_n}\div{N})$$

In [5]:
]dinput
avg2{
  N    ⍝ Count elements 
  n÷N   ⍝ Array divided by count
  +n     ⍝ Sum
}
In [6]:
)copy dfns cmpx
n?1000
cmpx 'avg1 n' 'avg2 n'
Out[6]:
/opt/mdyalog/17.1/64/unicode/ws/dfns.dws saved Tue Nov 12 00:26:19 2019
Out[6]:
avg1 n → 2.2E¯6 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ avg2 n → 4.1E¯6 | +87% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
In [9]:
⎕PROFILE 'clear'
⎕PROFILE 'start'
avg1 n
⎕PROFILE 'stop'
]Profile -lines
Out[9]:
500.5
Out[9]:
Total time: 0.0 msec Element msec % Calls #.avg1[2] 0.0 50.0 1 #.avg1[1] 0.0 30.0 1 #.avg1[0] 0.0 10.0 1 #.avg1[3] 0.0 10.0 1
In [10]:
repeat1e4
_Profile{1  _⎕PROFILE'clear'  _⎕PROFILE 'start'  r⍺⍺¨⍴⊂  _⎕PROFILE 'stop'}
In [11]:
repeat avg1 _Profile n
⎕VR'avg1'
]profile -lines
Out[11]:
∇ avg1←{ [1] N←≢⍵ ⍝ Count elements [2] s←+⌿⍵ ⍝ Sum elements [3] s÷N ⍝ Sum divided by count [4] } ∇
Out[11]:
Total time: 22.1 msec Element msec % Calls #.avg1[2] 11.5 52.0 10000 #.avg1[1] 5.3 23.9 10000 #.avg1[3] 3.4 15.4 10000 #.avg1[0] 1.3 6.0 10000
In [12]:
repeat avg2 _Profile n
⎕VR'avg2'
]profile -lines
Out[12]:
∇ avg2←{ [1] N←≢⍵ ⍝ Count elements [2] n←⍵÷N ⍝ Array divided by count [3] +⌿n ⍝ Sum [4] } ∇
Out[12]:
Total time: 43.8 msec Element msec % Calls #.avg2[2] 18.8 42.9 10000 #.avg2[3] 18.1 41.4 10000 #.avg2[1] 5.0 11.4 10000 #.avg2[0] 1.3 3.0 10000

Relatively easy gains

Avoid nested arrays or mixed-type arrays

In [13]:
3 4⎕A ⍝ 3 4  ⍴  'A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J' 'K' 'L'
Out[13]:
ABCD EFGH IJKL
In [16]:
⍝ 2 3  ⍴  p0 p1 p2 p3 p4 p5
⍝ p0 → 2 2  ⍴  1 2 3 4
⍝ p1 → 5  ⍴  'a' 'b' 'c' 'd' 'e'
⍝ p2 → 2 3  ⍴  p6 p6 p6 p6
⍝ p3 → ⍬  ⍴  1
⍝ p4 → 2  ⍴  p7 p8
⍝ p5 → ⍬  ⍴  3
⍝ p6 → 4  ⍴  'w' 'o' 'r' 'd'
⍝ p7 → ⍬  ⍴  2
⍝ p8 → ⍬  ⍴  'b'
2 3(2 2⍴⍳4)'abcde'(2 3⍴⊂'word')1 (2'b') 3

Out[16]:
┌───┬─────┬────────────────┐ │1 2│abcde│┌────┬────┬────┐│ │3 4│ ││word│word│word││ │ │ │├────┼────┼────┤│ │ │ ││word│word│word││ │ │ │└────┴────┴────┘│ ├───┼─────┼────────────────┤ │1 │2 b │3 │ └───┴─────┴────────────────┘
In [17]:
5posNest?500⍴⊂310
5posFlatposNest
Out[17]:
┌──────┬─────┬─────┬─────┬─────┐ │7 8 10│3 6 8│7 1 5│2 4 4│9 4 1│ └──────┴─────┴─────┴─────┴─────┘
Out[17]:
7 8 10 3 6 8 7 1 5 2 4 4 9 4 1
In [18]:
]runtime -c "0.5*⍨+/2*⍨-⍤1⍤1 99⍨posFlat" "0.5*⍨+/¨2*⍨∘.-⍨posNest"
Out[18]:
0.5*⍨+/2*⍨-⍤1⍤1 99⍨posFlat → 7.6E¯3 | 0% ⎕⎕⎕ 0.5*⍨+/¨2*⍨∘.-⍨posNest → 1.2E¯1 | +1423% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ ⎕⎕⎕⎕⎕⎕⎕⎕

Use inverted tables: Dyalog '18: Inverted Tables

8

Do work on large arrays where possible

In [19]:
b1=?100 1002
]runtime -c "+/,3<{+/,⍵}⌺3 3⊢b" "+/,{3<+/,⍵}⌺3 3⊢b"
Out[19]:
+/,3<{+/,⍵}⌺3 3⊢b → 3.1E¯5 | 0% +/,{3<+/,⍵}⌺3 3⊢b → 3.8E¯2 | +121050% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ ⎕
In [20]:
A?1e41e2
]runtime -c "{≢⍵}⌸A" "{≢⊢⍵}⌸A"
Out[20]:
{≢⍵}⌸A → 9.9E¯6 | 0% ⎕⎕ {≢⊢⍵}⌸A → 1.8E¯4 | +1660% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Dyalog idioms

Search: dyalog help idiom list

In [21]:
⍝ Sorting idioms
]runtime -c "{(⊂⍋⍵)⌷⍵}A" "{⍵⌷⍨⊂⍋⍵}A"
Out[21]:
{(⊂⍋⍵)⌷⍵}A → 9.8E¯6 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ {⍵⌷⍨⊂⍋⍵}A → 2.7E¯5 | +171% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Use ⎕CT←0 if possible

Algorithms and Primitive Complexity

Hsu, A.W., 2019. A data parallel compiler hosted on the GPU.

APL makes it easy to reason about algorithms.

https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations


Primitive complexity:

+     ⍝ O(n)
|     ⍝ O(n)
∘.f   ⍝ O(n*2)
In [22]:
PT0PrimesTil{2=+0=∘.|}   ⍝ Primes from 1 to ⍵ using Modulo and Reduction
PT1(⊢~∘.×)1↓⍳                 ⍝ Without Products                             
)copy dfns sieve pco                                                           
PT2sieve 1↓⍳                   ⍝ Sieve of Eratosthenes                        
PT310 pco 1,⊢                 ⍝ dfns.pco (lookup table)
Out[22]:
/opt/mdyalog/17.1/64/unicode/ws/dfns.dws saved Tue Nov 12 00:26:19 2019
In [23]:
⎕VR'sieve'
Out[23]:
∇ sieve←{ ⍝ Sieve of Eratosthenes. [1] ⍺←⍬ ⍝ Default no primes yet. [2] nxt←1↑⍵ ⍝ Next prime, and [3] msk←0≠nxt|⍵ ⍝ ... mask of non-multiples. [4] ∧/1↓msk:⍺,⍵ ⍝ All non multiples - finished. [5] (⍺,nxt)∇ msk/⍵ ⍝ Sieve remainder. [6] } ∇
In [24]:
_Time{0 0 0 0.2 cmpx ⍺⍺,' ',⍕}
In [25]:
)copy sharpplot
Out[25]:
/opt/mdyalog/17.1/64/unicode/ws/sharpplot.dws saved Tue Nov 12 00:23:00 2019
In [26]:
 {key}Plot data;d;n;s
 :If 0=⎕NC'key'
     key''
 :EndIf
 s⎕NEW SharpPlot
 ndata
 :For d :In ⊃⌽data
     s.DrawLineGraph d n
 :EndFor
 s.SetKeyText key
 View s

In [27]:
n5↓⌊10*0.1×⍳20
'Modulo reduction' 'Without products' 'Sieve' 'dfns.pco' Plot n{⍺⍵}('PT0'_Time¨n)('PT1'_Time¨n)('PT2'_Time¨n)('PT3'_Time¨n)
Out[27]:
3 5 6 7 10 12 15 19 25 31 39 50 63 79 100
Out[27]:
Created by Causeway SVG engine - SharpPlot v3.62.2 Paint the paper ===== Border ===== Region ===== X-Axis Ticks ===== X-Axis tickmarks Y-Axis Ticks ===== Y-Axis tickmarks Axes ===== Y-axis labels 0 0.000005 0.00001 0.000015 0.00002 0.000025 0.00003 0.000035 for X-axis labels 0 10 20 30 40 50 60 70 80 90 100 Heading, subheading and footnotes ===== Start of Line Chart =========== Points follow ... Line Start of Line Chart =========== Points follow ... Line Start of Line Chart =========== Points follow ... Line Start of Line Chart =========== Points follow ... Clipped Line Key ===== Key - Line Key - Line Key - Line Key - Line Modulo reduction Without products Sieve dfns.pco Reset to original origin
In [28]:
n5↓⌊10*0.1×5+⍳18
⍝ 'Without products' 'Sieve' Plot n{⍺⍵}('PT1'_Time¨n)('PT2'_Time¨n)
'Modulo reduction' 'Without products' 'Sieve' 'dfns.pco' Plot n{⍺⍵}('PT0'_Time¨n)('PT1'_Time¨n)('PT2'_Time¨n)('PT3'_Time¨n)
Out[28]:
12 15 19 25 31 39 50 63 79 100 125 158 199
Out[28]:
Created by Causeway SVG engine - SharpPlot v3.62.2 Paint the paper ===== Border ===== Region ===== X-Axis Ticks ===== X-Axis tickmarks Y-Axis Ticks ===== Y-Axis tickmarks Axes ===== Y-axis labels 0 0.00002 0.00004 0.00006 0.00008 0.0001 0.00012 0.00014 for X-axis labels 0 20 40 60 80 100 120 140 160 180 200 Heading, subheading and footnotes ===== Start of Line Chart =========== Points follow ... Line Start of Line Chart =========== Points follow ... Clipped Line Start of Line Chart =========== Points follow ... Line Start of Line Chart =========== Points follow ... Line Key ===== Key - Line Key - Line Key - Line Key - Line Modulo reduction Without products Sieve dfns.pco Reset to original origin

Fast APL

  • Analysis and profiling
  • Mitigating hotspots through
    • Implementing mechanical sympathy
    • Using special cased code (The Interpretive Advantage)
    • Compiling chunks
    • Outsourcing jobs
  • Algorithms and primitive complexity