Diane’s Lasagne Problem

Making Lasagne

Participants in the SA2 Performance Tuning workshop at the Dyalog ’18 User Meeting were encouraged to bring their own problems for the group to work on. Diane Hymas of ExxonMobil brought a good one. The one-liner computation is as follows:

lasagne0 ← {groups {+⌿⍵}⌸ amts ×[⎕io] spices[inds;]}

where

   n ← 8e5
   spices ← ?6000 44⍴0
   groups ← +\(16↑1 2)[?n⍴16]
   inds   ← ?n⍴≢spices
   amts   ← ?n⍴0

Applying lasagne0 to this dataset:

   ⍴ lasagne0 ⍬
100015 44
   ≢ ∪ groups
100015

   )copy dfns wsreq cmpx

   wsreq 'lasagne0 ⍬'
844799820
   cmpx  'lasagne0 ⍬'
2.12E¯1

The problem with lasagne0 is space rather than time. The 845 MB required for this dataset may be acceptable, but we can be called upon to cook up large batches of lasagne in a smallish workspace, on a machine with limited RAM. (Large n and large ≢∪groups.)

All benchmarks in this document were run in Dyalog APL version 17.0, in a 2 GB workspace, on a machine with generous RAM.

Solutions

Marshall Lochbaum solved the problem. The alternative solutions are as follows:

lasagne0 ← {groups {+⌿⍵}⌸ amts ×[⎕io] spices[inds;]}
lasagne1 ← {↑ (groups{⊂⍵}⌸amts) {+⌿⍺×[⎕io]spices[⍵;]}¨ groups{⊂⍵}⌸inds}
lasagne2 ← {↑ (groups{⊂⍵}⌸amts)      {⍺+.×spices[⍵;]}¨ groups{⊂⍵}⌸inds}
lasagne3 ← {↑ {amts[⍵]+.×spices[inds[⍵];]}¨ {⊂⍵}⌸groups}

lasagne0 is the original expression; lasagne1 and lasagne2 were derived by Marshall during the workshop; lasagne3 was suggested by a participant in the workshop. The four functions produce matching results. Comparing the space and time:

space (MB)           time
lasagne0 845 2.29e¯1 ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
lasagne1 74 3.60e¯1 ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
lasagne2 74 2.39e¯1 ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
lasagne3 74 2.93e¯1 ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

lasagne0 v  lasagne1

Nearly all of the space required to evaluate lasagne0 is accounted for by the space for computing the right argument to key:

   wsreq 'lasagne0 ⍬'
844799820

   wsreq 'amts ×[⎕io] spices[inds;]'
844799548

In fact, the array spices[inds;], all by itself, is already very large. It has shape (⍴inds),1↓⍴spices (≡ 8e5 44), each item requiring 8 bytes.

   wsreq 'spices[inds;]'
281599556

   ⍴ spices[inds;]
800000 44

   8 × ×/ ⍴ spices[inds;]
281600000

   qsize←{⎕size '⍵'}    ⍝ # bytes for array ⍵
   qsize spices[inds;]
281600040

lasagne1 avoids creating these large intermediate results, by first partitioning the arguments (groups{⊂⍵}⌸amts and groups{⊂⍵}⌸inds) and then applying a computation to each partition. In that computation, the operand function {+⌿⍺×[⎕io]spices[⍵;]}, is a partition of amts and is the corresponding partition of inds.

lasagne1, lasagne2 and lasagne3 require the same amount of space to run, so the comparison among them is on time.

lasagne1 v  lasagne2

The only change is from +⌿⍺×[⎕io]spices[⍵;] to ⍺+.×spices[⍵;], which are equivalent when is a vector. The interpreter can compute +.× in one go rather than doing +⌿ separately after doing ×[⎕io]; in such computation the interpreter can and does exploit the additional information afforded by +.× and is faster by a factor of 1.5 (= 2.39 ÷⍨ 3.60).

lasagne2 v  lasagne3

The idea in lasagne3 is doing one key operation rather than the two in lasagne2. Therefore, the changes between lasagne2 v lasagne3 are:

lasagne2 groups{⊂⍵}⌸amts
groups{⊂⍵}⌸inds
spices[⍵;]
lasagne3 {⊂⍵}⌸groups amts[⍵] spices[inds[⍵];]

All three key operations involve {⊂⍵}⌸ with groups as the key, and are roughly equally fast, each taking up no more than 7% of the total time.

   cmpx 'groups{⊂⍵}⌸amts' 'groups{⊂⍵}⌸inds' '{⊂⍵}⌸groups'
  groups{⊂⍵}⌸amts → 1.69E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
* groups{⊂⍵}⌸inds → 1.39E¯2 | -18% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
* {⊂⍵}⌸groups     → 1.36E¯2 | -20% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

lasagne3 is doing one less key operation than lasagne2 in exchange for doing, for each of the ≢∪groups (= 100015) executions of the operand function, amt[⍵] v and spices[ind[⍵];] v spices[⍵;]. Indexing is by no means slow, but it’s not as fast as references to and . Therefore, lasagne2 is faster.

The trade-off may differ for different values of groups. In this case groups are small-range integers so operations using it as the key value are fast.