# Shuffle Faster Please!

Andy reported that in the shuffle QA some functions take a long time:

`m9249   `“4½ days so far”
`rankop  `21.5 hours
`m12466  `26.3 hours
`points  `7.2 hours

Background: Shuffle QA is an intensification of the normal QA. The suite of over 1800 QA functions is run under shuffle, whereby every `getspace` (memory allocation) is followed by every APL array in the workspace being shifted (“shuffled”) and by a simple workspace integrity check. The purpose is to uncover memory reads/writes which are rendered unsafe by events which in normal operations are rare.

`m9249`

`m9249` tests `+\[a]`, `×\[a]`, `⌈\[a]`, and `⌊\[a]`. It ran in under 3 seconds in normal operations; for it to take more than 4½ days under shuffle, `getspace` is clearly the dominant resource and hereafter we focus on that.

`m9249` used expressions like:

``````assert (+⍀x) ≡ {⍺+⍵}⍀x
``````

Since `{⍺+⍵}` is not recognized by the scan operator `⍀` and therefore not supported by special code, `{⍺+⍵}⍀x` is done by the general `O(n*2)` algorithm for scans, AND on each scalar the operand `{⍺+⍵}` is invoked and a `getspace` is done. The improved, faster QA does this:

``````F ← {1≥≢⍵:⍵ ⋄ x ⍪ (¯1↑⍵) ⍺⍺⍨ ¯1↑x←⍺⍺ ∇∇ ¯1↓⍵}
assert (+⍀x) ≡ +F x
``````

With such a change, the faster `m9249` runs in 22 minutes under shuffle instead of over 4½ days, without any reduction in coverage.

The number of `getspace` are as follows: `+⍀x` takes `O(1)`. The second expression `{⍺+⍵}⍀x` does `×\1↓⍴x` separate scans, and each one takes `O((≢x)*2) getspace`; the total is therefore `O(((≢x)*2)××/1↓⍴x) ←→ O((≢x)××/⍴x)`. The third expression `+F` is linear in `≢x` and is therefore `O(≢x)` in the number of `getspace`. In `m9249` the largest `x` has size `33 5 7`. The following table shows the orders of complexity and what they imply for this largest `x`:

 `+⍀x` `O(1)` `1` `{⍺+⍵}⍀x` `O((≢x)××/⍴x)` `38115 = 33 × ×/ 33 5 7` `+F x` `O(≢x)` `33`

The ratio between 38115 and 33 goes a long way towards explaining why the time to run `m9249` under shuffle was over 4½ days and is now 22 minutes. (Why does `m9249` still take 22 minutes? It tests for the four functions `+ × ⌈ ⌊`, for various axes, for different datatypes, and for different axis lengths.)

`rankop`

Another shuffle sluggard `rankop` took 47 hours. `rankop` tests expressions of the form `f⍤r⊢y`and `x f⍤r⊢y` for various functions `f`. The techniques used for reducing the number of `getspace` differ from function to function. We look at `x↑⍤r⊢y` as an illustration.

``````assert (4↑⍤1⊢y) ≡ 4{⍺↑⍵}⍤1⊢y
``````

`⍴y` was `2 7 1 8 2 8`. The expression took `7.6e¯5` seconds normally and `7.5` seconds under shuffle. The left hand side requires `O(1) getspace`, the right hand side `O(×/¯1↓⍴x)`. The expression was changed to:

``````assert (4↑⍤1⊢y) ≡ 2 7 1 8 2 4↑y
``````

Feels like cheating, doesn’t it? 🙂 The new expression takes 0.115 seconds under shuffle.

`m12466`

`m12466` tests `n⌈/[a]x` (and `n⌊/[a]x`). Previously, it did this by comparing `n⌈/[a]x` against `n{⍺⌈⍵}/[a]x` for `x` with shape `7 17 13 11`, for each of the four axes, for two different values of `n`, and for the 1-, 2-, 4-, and 8-byte numeric datatypes. It required 5238426 `getspace` and 26.3 hours to run under shuffle.

``F←{p⍉⍺⍺⌿(⊂(⎕io-⍨⍳|⍺)∘.+⍳(1-|⍺)+(⍴⍵)[⍵⍵])⌷(⍋p←⍒⍵⍵=⍳⍴⍴⍵)⍉⍵}``

`F` is a dyadic operator. The left operand `⍺⍺` is `⌈` or `⌊`; the right operand `⍵⍵` is the axis; and `n(⌈ F a)x` is the same as `n⌈/[a]x`. `n(⌈ F a)x` mainly does `n⌈⌿x`; other axes are handled by transposing the axis of interest to be the first, then doing `n⌈⌿x`, then transposing the axes back into the required order. `n(⌈ F a)x` is much more complicated than `n{⍺⌈⍵}/[a]x` but has the advantage of using only `O(1) getspace`.

The revamped `m12466` function uses 13717 `getspace` and 2 minutes to run under shuffle.

`points`

`points` is a function to test monadic format (`⍕`) for different values of `⎕pp`.

``````points←{⍺←17 ⋄ ⎕pp←⍺
tt←{(⌽∨\⌽⍵≠⍺)/⍵}                   ⍝ trim trailing ⍺s.
~0∊∘.{                             ⍝ all pairs (⍳⍵)
num←'0'tt↑,/⍕¨⍺'.'⍵              ⍝ eg '9.3'
num≡⍕⍎num                        ⍝ check round trip
}⍨⍳⍵
}``````

The expression is `14 15 16 17 points¨ 100`, meaning that for each of the 100 numbers `num` as text,

```  1.1   1.2   1.3   1.4   1.5     1.100   2.1   2.2   2.3   2.4   2.5     2.100   3.1   3.2   3.3   3.4   3.5 ... 3.100 ...  99.1  99.2  99.3  99.4  99.5    99.100 100.1 100.2 100.3 100.4 100.5   100.100```

a “round-trip” `num≡⍕⍎num` is evaluated. The expression requires 1.3 million `getspace` and 7.2 hours to run under shuffle.

A much more efficient version with respect to `getspace` is possible. Let `x` be a vector. `⍕¨x` requires `O(≢x) getspace`; `⍕x` requires `O(1) getspace` and, like `⍕¨x`, formats each individual element of `x` independently.

``````points1←{
⎕io←⎕ml←1
tt←{(⌽∨\⌽⍵≠⍺)/⍵}                   ⍝ trim trailing ⍺s
x←1↓∊(' ',¨s,¨'.')∘.,'0'tt¨s←⍕¨⍳⍵  ⍝ numbers as text
y←⍎x
{⎕pp←⍵ ⋄ x≡⍕y}¨⍺                   ⍝ check round trip
}``````

`14 15 16 17 points1 100` requires 11050 `getspace` and less than 2 minutes to run under shuffle.

What to do?

Andy says that the shuffle QA takes over 2000 hours (!). The shuffle sluggards will be tackled as done here by using more parsimonious APL expressions to compare against the aspect being QA-ed. Going forward, a new QA function will be run under shuffle as it is being developed. The developers will be unlikely to tolerate a running time of 4½ days.