Mind Boggling Performance

Posted on March 17, 2026 by Brian Becker

Or is it Minding Boggle Performance?

Better late than never? This was a blog post I started to write during COVID-19 and now I’ve finally gotten around to finishing it.

In the 2019 APL Problem Solving Competition, we presented a problem to solve the Boggle game . In Boggle, a player tries to make as many words as possible from contiguous letters in a 4×4 grid, with the stipulation that you cannot reuse a position on the board.

Rich Park’s webinar from 17 October 2019 presents, among other things, a very good discussion and comparison of two interesting solutions submitted by Rasmus Précenth and Torsten Grust. As part of that discussion, Rich explores the performance of their solutions. After seeing that webinar, I was curious about how my solution might perform.

Disclaimer

Please take note that performance was not mentioned as one of the criteria for this problem other than the implicit expectation that code completes in a reasonable amount of time. As such, this post is in no way intended to criticize anyone’s solutions – in fact, in many cases I’m impressed by the elegance of the solutions and their application of array-oriented thinking. I have no doubt that had we made performance a primary judging criterion, people would have taken it into consideration and possibly produced somewhat different code.

Goals

I started writing APL in 1975 at the age of 14 and “grew up” in the days of mainframe APL when CPU cycles and memory were precious commodities. This made me develop an eye towards writing efficient code. In developing my solution and writing this post, I had a few goals in mind:

Use straightforward algorithm optimizations and not leverage or avoid any specific features in the interpreter. Having a bit of understanding about how APL stores its data helps though.
Illustrate some approaches to optimization that may be generally applicable.
Encourage discussion and your participation. I don’t present my solution as the paragon of performance. I’m sure there are further optimizations that can be made and hope you’ll (gently) suggest some.

The task was to write a function called FindWords that has the syntax:

      found←words FindWords board

where:

words is a vector of words. We used Collins Scrabble Words, a ≈280,000-word word list used by tournament Scrabble™ players. We store this in a variable called AllWords. Note that single letter words like “a” and “I” are not legitimate Scrabble words.
board is a matrix where each cell contains one or more letters. A standard Boggle board is 4×4.
the result, found is a vector that is a subset of words containing the words that can be made from board without revisiting any cells.

Although the actual Boggle game uses only words of 3 letters or more, for this problem we permit words of 2 or more letters.

Here’s an example of a 2×2 board:

     AllWords FindWords ⎕← b2← 2 2⍴'th' 'r' 'ou' 'gh'
┌──┬──┐
│th│r │
├──┼──┤
│ou│gh│
└──┴──┘
┌──┬───┬────┬─────┬─────┬──────┬───────┐
│ou│our│thou│rough│routh│though│through│
└──┴───┴────┴─────┴─────┴──────┴───────┘

First, let’s define some variables that we’ll use in our exploration:

      b4← 4 4⍴ 't' 'p' 'qu' 'a' 's' 'l' 'g' 'i' 'r' 'u' 't' 'e' 'i' 'i' 'n' 'a' ⍝ 4×4 board
      b6← 6 6⍴'jbcdcmvueglxriybgeiganuylvonxkfeoqld' ⍝ 6×6 board

If you’re using Dyalog v20.0 or later, you can represent this using array notation:

⍝ using array notation with single-line input:
      b4←['t' 'p' 'qu' 'a' ⋄ 's' 'l' 'g' 'i' ⋄ 'r' 'u' 't' 'e' ⋄ 'i' 'i' 'n' 'a']
      b6←['jbcdcm' ⋄ 'vueglx' ⋄ 'riybge' ⋄ 'iganuy' ⋄ 'lvonxk' ⋄ 'feoqld']

⍝ or, using array notation with multi-line input:
      b4←['t' 'p' 'qu' 'a'
          'slgi'
          'rute'
          'iina']

      b6←['jbcdcm'
          'vueglx'
          'riybge'
          'iganuy'
          'lvonxk'
          'feoqld']

The representation does not affect the performance or the result:

      b4 b6
┌──────────┬──────┐
│┌─┬─┬──┬─┐│jbcdcm│
││t│p│qu│a││vueglx│
│├─┼─┼──┼─┤│riybge│
││s│l│g │i││iganuy│
│├─┼─┼──┼─┤│lvonxk│
││r│u│t │e││feoqld│
│├─┼─┼──┼─┤│      │
││i│i│n │a││      │
│└─┴─┴──┴─┘│      │
└──────────┴──────┘

There were 9 correct solutions submitted for this problem. We’ll call them f1 through f9 – my solution is f0. Now let’s run some comparative timings using cmpx from the dfns workspace. cmpx will note whether the result of any of the latter expressions returns a different result from the first expression. We take the tally (≢) of the resulting word lists to make sure the expressions all return the same result. We assume that the sets of words are the same if the counts are the same. These timings were done using Dyalog v20.0 with a maximum workspace (MAXWS) of 1GB running under Windows 11 Pro. To keep the expressions brief I bound AllWords as the left argument to each of the solution functions:

      f0←≢AllWords∘#.Brian.Problems.FindWords

To make it easier to run timings, I wrote a simple function to call cmpx with the solutions of my choosing (the default is all solutions).

      )copy dfns cmpx
      time←{⍺←¯1+⍳10 ⋄ cmpx('f',⍕,' ',⍵⍨)¨⍺}

This allows me to compare any 2 or more solutions, or by default, all solutions on a given board variable name.

      time 'b4' ⍝ try a "standard" 4×4 Boggle board
  f0 b4 → 1.2E¯2 |      0%
  f1 b4 → 2.3E¯1 |  +1791% ⎕⎕
  f2 b4 → 4.3E¯1 |  +3491% ⎕⎕⎕
  f3 b4 → 7.3E¯1 |  +5941% ⎕⎕⎕⎕⎕                                    
  f4 b4 → 5.6E0  | +46600% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
  f5 b4 → 1.1E0  |  +8758% ⎕⎕⎕⎕⎕⎕⎕⎕                                 
  f6 b4 → 8.9E¯1 |  +7325% ⎕⎕⎕⎕⎕⎕                                   
  f7 b4 → 1.1E0  |  +9275% ⎕⎕⎕⎕⎕⎕⎕⎕                                 
  f8 b4 → 4.2E0  | +34750% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕           
  f9 b4 → 2.0E0  | +16291% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

If we try to run on the 6×6 sample

      0 1 2 3 4 5 6 7 9 time'b6'
  f0 b6 → 2.2E¯2 |       0%                                          
  f1 b6 → 3.7E¯1 |   +1577%                                          
  f2 b6 → 1.0E0  |   +4581%                                          
  f3 b6 → 1.5E0  |   +6872%                                          
  f4 b6 → 1.6E2  | +705950% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
  f5 b6 → 2.3E0  |  +10409% ⎕                                        
  f6 b6 → 1.9E0  |   +8463%                                          
  f7 b6 → 2.0E0  |   +9131% ⎕
  f9 b6 → 2.4E0  |  +10822% ⎕

f8 is excluded as it would cause a WS FULL in my 1GB workspace.

Why is f0 about 16-18 times faster than the next fastest solution, f1? I didn’t set out to make FindWords fast, it just turned out that way. Let’s take a look at the code…

     ∇ r←words FindWords board;inds;neighbors;paths;stubs;nextcells;mwords;mask;next;found;lens;n;m;map
[1]    inds←⍳⍴board                                      ⍝ board indices
[2]    neighbors←(,inds)∘∩¨↓inds∘.+(,¯2+⍳3 3)~⊂0 0       ⍝ matrix of neighbors for each cell
[3]    paths←⊂¨,inds                                     ⍝ initial paths
[4]    stubs←,¨,board                                    ⍝ initial stubs of words
[5]    nextcells←neighbors∘{⊂¨(⊃⍺[¯1↑⍵])~⍵}              ⍝ append unused neighbors to path
[6]    mwords←⍉↑words                                    ⍝ matrix of candidate words, use a columnar matrix for faster ∧.=
[7]    mask←mwords[1;]∊⊃¨,board                          ⍝ mark only those beginning with a letter on the board
[8]    mask←mask\∧⌿(mask/mwords)∊' ',∊board              ⍝ further mark only words containing only letters found on the board
[9]    words/⍨←mask                                      ⍝ keep those words
[10]   mwords/⍨←mask                                     ⍝ keep them in the matrix form as well
[11]   r←words∩stubs                                     ⍝ seed result with any words that may already be formed from single cell
[12]   :While (0∊⍴paths)⍱0∊⍴words                        ⍝ while we have both paths to follow and words to look at
[13]       next←nextcells¨paths                          ⍝ get the next cells for each path
[14]       paths←⊃,/(⊂¨paths),¨¨next                     ⍝ append the next cells to each path
[15]       stubs←⊃,/stubs{⍺∘,¨board[⍵]}¨next             ⍝ append the next letters to each stub
[16]       r,←words∩stubs                                ⍝ add any matching words
[17]       mask←(≢words)⍴0                               ⍝ build a mask to remove word beginnings that don't match any stubs
[18]       found←(≢stubs)⍴0                              ⍝ build a mask to remove stubs that no words begin with
[19]       lens←≢¨stubs                                  ⍝ length of each stub
[20]       :For n :In ∪lens                              ⍝ for each unique stub length
[21]           m←n=lens                                  ⍝ mark stubs of this length
[22]           map←(↑m/stubs)∧.=n↑mwords                 ⍝ map which stubs match which word beginnings
[23]           mask∨←∨⌿map                               ⍝ words that match
[24]           found[(∨/map)/⍸m]←1                       ⍝ stubs that match
[25]       :EndFor
[26]       paths/⍨←found                                 ⍝ keep paths that match
[27]       stubs/⍨←found                                 ⍝ keep stubs that match
[28]       words/⍨←mask                                  ⍝ keep words that may yet match
[29]       mwords/⍨←mask                                 ⍝ keep matrix words that may yet match
[30]   :EndWhile
[31]   r←∪r
     ∇

Attacking the Problem

Intuitively, this felt like an iterative problem. A mostly-array-oriented solution might be to generate character vectors made up from the contents of all paths in board and then do a set intersection with words. But that would be horrifically inefficient – there are over 12-million paths in a 4×4 matrix and, in the case of b4, there are only 188 valid words. What about a recursive solution (many of the submissions used recursion)? I tend to avoid recursion unless there are clear advantages to using it, and in this case I didn’t see any advantages, clear or otherwise. So, iteration it was…

I decided to use two parallel structures to keep track of progress:

paths – the paths traversed through the board
stubs – the word “stubs” built from the contents of the cells in paths

paths is initialized to the indices of the board, and stubs is initialized to the contents of each cell. Then iterate:

Keep any stubs that are in words
Append the contents of each candidate’s unvisited neighboring cells to the candidates, resulting in a new candidates list
Repeat until there’s nothing left to look at

Setup

First, I need to find the adjacent cells for each cell in board.

[1]    inds←⍳⍴board                                      ⍝ board indices
[2]    neighbors←(,inds)∘∩¨↓inds∘.+(,¯2+⍳3 3)~⊂0 0       ⍝ matrix of neighbors for each cell

You might recognize line [2] as a stencil-like (⌺) operation. Why, then, didn’t I use stencil? To be honest, it didn’t occur to me at the time – I knew how to code what I needed without using stencil. As it turns out, for this application, stencil is slower. The stencil expression is shorter, more “elegant”, and possibly more readable (assuming you know how stencil works), but it takes more than twice the time. Granted, this line only runs once per invocation so the performance improvement from not using it is minimal.

      inds←⍳4 4
      ]RunTime -c '(,inds)∘∩¨↓inds∘.+(,¯2+⍳3 3)~⊂0 0' '{⊂(,⍺↓⍵)~(⍵[2;2])}⌺3 3⊢inds'
                                                                                              
  (,inds)∘∩¨↓inds∘.+(,¯2+⍳3 3)~⊂0 0 → 2.2E¯5 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕                       
  {⊂(,⍺↓⍵)~(⍵[2;2])}⌺3 3⊢inds       → 5.0E¯5 | +127% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

I wrote a helper function nextcells which, given a path, returns the unvisited cells adjacent to the last cell in the path. For example, if we have a path that starts at board[1;1] and continues to board[2;2], then the next unvisited cells for this path are given by:

      nextcells (1 1)(2 2)
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│┌───┐│┌───┐│┌───┐│┌───┐│┌───┐│┌───┐│┌───┐│
││1 2│││1 3│││2 1│││2 3│││3 1│││3 2│││3 3││
│└───┘│└───┘│└───┘│└───┘│└───┘│└───┘│└───┘│
└─────┴─────┴─────┴─────┴─────┴─────┴─────┘

A contributor to improved performance is set up next. I created a parallel transposed matrix copy of words.

[6]    mwords←⍉↑words ⍝ matrix of candidate words, use a columnar matrix for faster ∧.=

Why create another version of words and why is it transposed?

In general, it’s faster to operate on simple arrays.
Simple arrays – arrays containing only flat, primitive, data without any nested elements – are stored in a single, contiguous, block of memory. Elements are laid out contiguously in row-major order, meaning the last dimension changes fastest. For a 2D matrix, it stores the first row left-to-right, then the second row, and so on. Transposing the word matrix makes prefix searching as we look for candidates that could become valid words much more efficient. Consider a matrix consisting of the words “THE” “BIG” “DOG”. If stored one word per row, the interpreter has to “skip” to find the first letter in each word. However, in a column-oriented matrix the first letters are next to one another and likely to be in cache, making them much quicker to access.

Things Run Faster If You Do Less Work

Smaller searches are generally faster than larger ones. If we pare down words and stubs as we progress, we will perform smaller searches. The first pass at minimizing the data to be searched is done during setup – we remove any words that don’t begin with a first letter of any of board‘s cells as well as words that contain letters not found in board:

[7]    mask←mwords[1;]∊⊃¨,board               ⍝ mark only those beginning with a letter on the board
[8]    mask←mask\∧⌿(mask/mwords)∊' ',∊board   ⍝ further mark only words containing only letters found on the board
[9]    words/⍨←mask                           ⍝ keep those words
[10]   mwords/⍨←mask                          ⍝ keep them in the matrix form as well

For board b4, this reduces the number of words to be searched from 267,752 to 16,247 – a ~94% reduction. Then we iterate, appending each path’s next unvisited cells and creating new stubs from the updated paths:

[13]       next←nextcells¨paths                      ⍝ get the next cells for each path
[14]       paths←⊃,/(⊂¨paths),¨¨next                 ⍝ append the next cells to each path
[15]       stubs←⊃,/stubs{⍺∘,¨board[⍵]}¨next         ⍝ append the next letters to each stub

Append any stubs that are in words to the result:

[16]       r,←words∩stubs                                ⍝ add any matching words

Because a cell can have more than one letter, we might have stubs of different lengths, so we need to iterate over each unique length:

[19]       lens←≢¨stubs           ⍝ length of each stub
[20]       :For n :In ∪lens       ⍝ for each unique stub length

Because we’re doing prefix searching, the inner product ∧.= can tell us which stubs match prefixes of which words. Now we can see the reason for creating mwords. Since the data in mwords is stored in “raveled” format, n↑mwords quickly returns a matrix of all n-length prefixes of words:

[21]           m←n=lens                    ⍝ mark stubs of this length
[22]           map←(↑m/stubs)∧.=n↑mwords   ⍝ map which stubs match which word beginnings
[23]           mask∨←∨⌿map                 ⍝ words that match
[24]           found[(∨/map)/⍸m]←1         ⍝ stubs that match
[25]       :EndFor

We then use our two Boolean arrays, found and mask, to pare down paths/stubs and words/mwords respectively. If we look at the number of words and stubs at each step, we can see that the biggest performance gain is realized by doing less work:

┌──────────────────┬───────┬──────┐
│Phase             │≢words │≢stubs│
├──────────────────┼───────┼──────┤
│Initial List      │267,752│     0│
├──────────────────┼───────┼──────┤
│After Initial Cull│ 16,247│    16│
├──────────────────┼───────┼──────┤
│After 2-cell Cull │  7,997│    56│
├──────────────────┼───────┼──────┤
│After 3-cell Cull │  2,736│   152│
├──────────────────┼───────┼──────┤
│After 4-cell Cull │  1,159│   178│
├──────────────────┼───────┼──────┤
│After 5-cell Cull │    371│   119│
├──────────────────┼───────┼──────┤
│After 6-cell Cull │     87│    42│
├──────────────────┼───────┼──────┤
│After 7-cell Cull │     16│    10│
├──────────────────┼───────┼──────┤
│After 8-cell Cull │      2│     1│
├──────────────────┼───────┼──────┤
│All Done          │      0│     0│
└──────────────────┴───────┴──────┘

Does `mwords` Make Much of a Difference?

As an experiment, I decided to write a version, f10, that does not used transposed word matrix mwords (it still does the words and stubs culling). I compared it to my original version,f0, and the fastest submitted version, f1:

      0 1 10 time 'b4'
  f0  b4 → 1.4E¯2 |     0% ⎕⎕                                       
  f1  b4 → 2.2E¯1 | +1551% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  f10 b4 → 2.1E¯1 | +1422% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Interestingly, f10 performed remarkably close to f1. When I looked at the code for f1, I saw that the author had implemented a similar culling approach and had commented in several places that the construct was to improve performance. Good job! But this does demonstrate that maintaining a parallel, simple, copy of words makes the solution run about 15× faster.

Takeaways

There are a couple of other optimizations I could have implemented:

In the setup, I could have filtered out all words that were longer than ≢∊board.
If this FindWords was used a lot, and I could be fairly certain that words was static (unchanging), then I could create mwords outside of FindWords. The line that creates mwords consumes about half of total time of the function.

When thinking about performance and optimization:

Unless there’s an overwhelming reason to do so – don’t sacrifice code clarity for performance. If you implement non-obvious performance improvements, note them in comments or documentation.
Optimize effectively – infinitely speeding up a piece of code that contributes 1% to an application’s CPU consumption makes no real impact.
Consider how your data is structured and how that might affect performance. In this case, representing words as a vector of words is convenient, readable, and there aren’t those extra spaces that might occur in a matrix format. But as we saw, it performs poorly compared to a simple matrix format. Don’t be afraid to make the data conform to a more performant organization.
Along similar lines, consider how simple arrays are stored in contiguous memory and whether you can take advantage of that.

In case you were wondering, the two solutions Rich Park looked at in his webinar were f4 and f7 in the timings above. The fastest submission, f1, was submitted by Julian Witte. Please remember that we did not specify performance as a criterion for the problem, so this is in no way a criticism of any of the submissions.

If you’re curious to look at the code for the submissions, this .zip file includes namespaces f0–f9, each of which contains a FindWords function and any needed subordinate functions (in addition to the solution namespaces, the .zip file also includes AllWords, the time function and 4 sample boards – b2,b3,b4, and b6). You can extract and use the code as follows:

Unzip Submissions.zip to a directory of your choosing.
In your APL session, enter:
]Link.Import # {the directory you chose}/Submissions
This step might take several seconds when Link.Import brings in AllWords

You can then examine the code, run your own timings, and so on. One interesting thing to explore is which submissions properly handle 1×1 and 0×0 boards.

Postscript

When I started to write the explanation of my code, it occurred to me: “This is 2026 and we have LLMs that might be able to explain the code. Let’s give them a try…”

So, I asked each of Anthropic’s Claude Opus 4.6 Extended, Google’s Gemini Pro, and Microsoft’s Copilot Think Deeper the following:

Explain the attached code. Note that a cell in board can have multiple letters like “qu” or “ough”. Also note that the words list is the official scrabble words list and has no single letter words.

The results were interesting and, in several places, a more concise and coherent explanation than I might produce. But how accurate and useful were their explanations? Stay tuned for a blog post about how well different LLMs explain APL code!

The APL Quest Series

Posted on March 15, 2024 by Adám Brudzewsky

It seemed like a normal Friday until mid-afternoon. But on 4 February 2022, I embarked on a journey that, at the time, seemed to stretch impossibly far into the future — a future that wasn’t entirely known yet. In the APL Orchard chat room on Stack Exchange, a dozen APLers, some experts and some newbies, held the very first APL Quest chat event.

In this first session, we explored the oldest APL Problem Solving Competition phase 1 problem – question number 1 from 2013 – presented our solutions, and discussed them for about half an hour. The following week, I recorded a video where I went through some of these solutions, with the code posted on GitHub. This pattern continued each week; chat events on Fridays, and a follow-up video released (usually) the week after, each session looking at a different phase 1 problem.

APL Quest chat in progress

Inspiration

Originally, the idea came from Richard Savenije, inspired by LeetCode. Both he and Stefan Kruger, who later became my colleague, were frustrated with LeetCode’s assumptions about programming languages – assumptions that didn’t really hold for APL. Aside from that, their test framework didn’t permit APL submissions anyway.

So, Richard suggested that we should make our own problems website, and Stefan pointed out that my colleague Rich Park had already set up a website that offered automatic checking of solutions to simple problems. With the help of a summer intern, Rich had populated the site with all APL Problem Solving Competition phase 1 problems from 2013 until 2021 (the 2022 round was scheduled to launch two months later). Richard suggested using this site for weekly puzzles, one every Friday afternoon, and we found a time that suited the people present.

Next, we had to decide on a format. Should it be a Zoom meeting or a chat event? Earlier, there had been a couple of chat events series in the APL Orchard; the APL Cultivation series ran weekly from October 2017 until May 2018 and semi-weekly from 28 November 2019 until August 2020. Stefan Kruger was in the process of converting these to a since-completed book, APL Cultivations. After some discussions, we decided to have the sessions in chat, and I came up with the idea of recording a short screen cast after each event.

I got the arithmetic wrong, and claimed we’d have 100 weeks worth of problems, as the 2022 problems would be available by the time we had explored the other problems, giving us almost two years of material. Actually, the 2023 problems were ready when we got to them! But either way, the end seemed very far away…

Technical Details

And yet, here we are, after 110 weeks, 110 live chat events, and 110 published videos – a total of over 22 hours of video contents! We never missed a week, even on various holidays, though Rich Park did have to substitute for me a few times when I was prevented from hosting, and sometimes I had to push off recording and/or publishing a video until two or three weeks after the chat event. Sometimes, I’d host the chat event while travelling in a car or a train, and sometimes I’d record videos in hotel rooms or in other people’s homes. By always using a plain white (or off-white) background, I was able to get a consistent look, irrespective of where I was, though I sometimes had to move furniture to sit in front of a bare wall, and once I had to drape my bedspread over the hotel room’s television…

Recording an APL Quest session

Speaking of looks, I’ve received praise for the technical quality of my videos; for their smooth integration of presentation and live coding and for their nice design. So, for those who are interested in how I did it…the introductory screen and the problem statement are simply PowerPoint slides with Fade transitions. After the problem description slide, I faded to a blank slide, and from there, I switched application to RIDE (on Microsoft Windows, you have to install it separately), which was running in full-screen mode with the Language Bar and Status Bar hidden. When I started, the newest released RIDE was version 4.3, but I was running pre-releases of v4.4 and Dyalog v18.2 that added the Nord theme which I had fallen in love with in 2021. By matching my slide colours to the RIDE theme, I achieved a seamless transition without having to do any video editing.

I’m somewhat of a typeface enthusiast. Previously, I had searched for what I considered good sans-serif fonts, and for this project, I choose the humanist Go font. For APL code, I went with my own SAX2 font which I had created by extracting letterforms from the old SAX (SHARP APL for UNIX) manual, and then extended to cover the characters necessary today. However, it bothered me that the SAX font looked so thin next to Go, due to being digitised from the golf ball of a IBM Selectric without accounting for the visual weight normally added by the typewriter’s ink ribbon.

I had to hack font selection into RIDE, which I was able to do because RIDE is built using normal web technology, in particular CSS. After some experimentation, I found the way to do it. After I switched to RIDE v4.5, I was able to set the fontface in an official manner, but even this wasn’t enough. I had relied on PowerPoint’s reasonable auto-bolding of the SAX font, which otherwise didn’t include bold, but RIDE v4.5 wouldn’t let me style the APL font further, so I had to hack RIDE again!

“Hacking” RIDE v4.5

This time, I didn’t need to modify source files, but rather found that RIDE was “vulnerable” (though not in a dangerous way) to CSS injection through its font input field. Making the font bold didn’t look right, as that would only smear the letters horizontally, but adding a “text stroke” had the desired effect. If you want your RIDE to resemble what you can see in the videos, set the APL font to SAX2')}.monaco-editor *{-webkit-text-stroke:.67px currentcolor}. With the visuals set, I recorded using OBS Studio and, on the rare occasion that I needed to edit something, I used DaVinci Resolve.

Concluding the Series

When we finished that last session, it felt rather anti-climactic, but I can look back on a very enjoyable time. And of course, the efforts that all participants put into this are not forgotten; we’ve got an amazing chat and video series that future APLers can enjoy. Thank you to everyone who contributed; chat participants, video commenters, colleagues (especially Brian Becker who authored all the problems), and most of all my wife, who often had to encourage me to record the next video(s) and also gave me the time and space to do so, even if it meant single-handedly keeping our children quiet. If you’re up for a marathon, you can watch the entire 22-hour series on YouTube, and APL Quest on the APL Wiki includes links to all the problems, chat sessions, code, and videos.

Dyalog ’23 Videos: Week 2 – APL Problem Solving Competition

Posted on November 10, 2023 by Morten Kromberg

The section that is dedicated to the annual APL Problem Solving Competition is always one of my own favourite parts of a Dyalog user meeting, and the talks by the two winners this year were no exception. It is always a treat to hear about how the student winners are able to go from zero knowledge about APL to delivering very well designed, array-oriented, solutions in a few weeks, sometimes days!

This year, we were very pleased to have both the student grand prize winner Andrea Piseri and the professional winner Alexander Block present. Andrea is studying mathematics at Università degli Studi di Milano (University of Milan) and Alexander is an actuary at the Viridium Insurance Group in Germany.

Before Gitte Christensen presented the prizes to the winners and they gave their talks, our “Chief Problem Maker” Brian Becker gave us a brief history of the competition and overview of the contest website (which uses Dyalog-grown tools). He also mentioned that we are revising the format of the competition, most likely by running simpler/smaller problem sets at a higher frequency. We’ll be making official announcements about that early in 2024.

Read more about this section of the user meeting in our Dyalog ’23 daily blog post.

——————————————

This week’s videos:

Materials for all presentations can be downloaded from the Dyalog ’23 webpage.

Dyalog ’23 Day 4: So Many Problems to Solve

Posted on October 18, 2023 by Dyalog

We started Wednesday with an update to the co-dfns project by Aaron Hsu. Aaron is trying to make APL more accessible to more people for tackling more problems. He explained how version 4 focusses on good performance on GPUs and detailed error reporting – including a parser that can be used for static analysis of APL code outside of Co-dfns – and how version 5 intends to target more platforms, improving integration of APL in other systems. There are even rumours of a JavaScript backend on the horizon! Dyalog available in the browser, wherever you go.

Brandon Wilson presents the challenges of parsing YAML

Brandon Wilson is a relative newcomer to APL. Although his main interest is in AI safety, he has significant experience in mainstream computer systems. This is part of what made him decide to write a YAML (YAML Ain’t Markup Language) parser in APL. Interestingly, most of the existing YAML parsers in use today fail some part of the test suite. This speaks to the complexity of the task and how there are many interactions between different parts of YAML that are not obvious. Brandon is hoping that writing a complete parser the APL way will lead to insights into the YAML specification that he can give back to the YAML community to help the specification developers better communicate what is needed to other parser maintainers.

Next, Josh David highlighted the huge demand for statistics in data science, machine learning, and the increase of data-driven decision-making in business. Although data preparation is easy in APL, he noted the lack of ready-made code for doing statistics. Simple summaries and linear regression take just a few primitives, but Josh showed us a couple of libraries for doing more complex statistical analysis. He demonstrated rapid iteration on multiple linear regression using KokoStats by Dr. Bill Koko, performing multiple tests and seeing the impact of the selected data on the predictive power of the regression model. In Professor Stephen Mansour’s TamStat package, the use of operators reduces the overall number of functions that users need to memorise, and a cross-platform graphical interface makes a great environment for exploring and learning statistics.

Jesús Galan Lopez returned to expand on something that he mentioned in his previous presentation – the modelling of grain growth in solid materials. Students at his university were tasked with reproducing models from published research. They wrote their solutions in Python because it was familiar to them, but Jesús wanted to see how array programming would compare. He found that his APL solution was generally shorter, cleaner, and faster. Of course, he had to compare more like-for-like programs by trying his solution in NumPy as well, and he found Dyalog had comparable performance.

Grand Prize winner Andrea Piseri

Then it was time for the presentation of prizes to winners of this year’s APL Problem Solving Competition. Brian Becker gave a brief history of the competition and overview of the contest website (which uses Dyalog-grown tools). He also announced future changes to the competition, such as quarterly sets of Phase-1-style one-liner problems. Gitte then presented certificates to the student grand prize winner, Andrea Piseri, and professional winner, Alexander Block.

Alexander was first to introduce himself – he is an actuary using APL to solve problems, working in insurance companies in Germany – and talk us through a couple of his solutions. Having used Haskell, he is a big fan of point-free (tacit) programming, and liked his use of the over operator (⍥) in his solution to the Risk attack problem from phase 1 (problem 5).

Professional winner Alexander Block

Andrea Piseri is a mathematics student with a particular interest in abstract algebra and mathematical logic, as well as a programming language enthusiast. Coming from the functional programming world, Andrea first solved the DNA reading frames problem (phase 2, problem 1, task 5) using the “flatmap” pattern, but then came up with another solution leveraging comparison and interval-index to process the whole input at once. When tackling the “make change” problem (phase 2, problem 2, task 3) he was surprised to find APL was not so opinionated and that he could quite easily map iterative and recursive patterns from Haskell onto dfns.

This afternoon was the annual Viking Challenge, and this year the team from Midgaard Event set up a thrilling mystery in which we were split into teams to solve a variety of puzzles. The individual puzzles offered a range of challenges to suit all participants, with the ultimate challenge being to piece clues together in a process of elimination. There was temptation to write an APL program to solve the problem, but it was resisted as nine of the twelve teams managed to work out the solution with pen and paper. Eyes rolled with chagrin all around the auditorium when it was announced that the winning team was the team that included both Gitte and Stine!

The winning team of this year’s Viking Challenge

Today’s presentations:

The 2021 APL Problem Solving Competition: Phase I – Best of Breed

Posted on March 25, 2022 by Guest Blogger

By: Stefan Kruger
Stefan works for IBM making databases. He tries to learn at least one new programming language a year, and a few years ago he got hooked on APL and participated in the competition. This is his perspective on some solutions that the judges picked out – call it the “Judges’ Pick”, if you like; smart, novel, or otherwise noteworthy solutions that can serve as an inspiration.

Congratulations to all the winners of the 2021 APL Problem Solving Competition (you can learn more about the phase 2 winners in this article) and well done to Dzintars Klušs who won the Grand Prize. At the recent Dyalog ’21 user meeting, we got to enjoy the runner-up, Victor Ogunlokun, walking us through his solutions live.

In this post I’ll go through some great solutions that were submitted (and some that weren’t submitted) to the Phase I problems so that we can all marvel in the ingenuity and perhaps learn a thing or two. If you’re feeling inspired by the end, go ahead and participate in this year’s round which just launched.

If you’re new to the APL Problem Solving Competition, Phase I problems tend to be short and the expectation is that solutions will be “one-liners” (dfns). However, although it might seem like it from some of the solutions here, this isn’t a code golf competition! Solutions are judged holistically: do they solve the problem, are they efficient, and are they clear? Even though a few test cases are given, there is no guarantee that your solution is correct just because it works for the example data. The judging process involves running the code on many hidden test cases too. Crucially, just because your code is accepted, it doesn’t necessarily mean that you’ll get full marks.

As with my blog post that reviewed the 2020 Phase II solutions, I’ve included a more in-depth examination of one or two problems.

Problem 1: Are You a Bacteria?

Something from the excellent Project Rosalind problem collection, the task is to compute the combined percentage of guanine (G) and cytosine (C) in a given DNA-string.

Efficiency can vary a lot, depending on whether summation or multiplication (or even division!) is performed first. Some solutions were also leading-axis oriented.

Here’s my solution:

      {100×(+⌿⍵∊'CG')÷≢⍵} 'ACGTACGTACGTACGT'
50

which several competitors made more tacit with:

      {100×(+⌿÷≢)⍵∊'GC'} 'ACGTACGTACGTACGT'
50

or even went further:

      (100×≢÷⍨1⊥∊∘'GC') 'ACGTACGTACGTACGT'
50

If you’re unfamiliar with the 1⊥ trick, it’s a way of summing a vector:

      1⊥6 3 9 8 12 62
100

It’s perhaps not immediately obvious why this should work. Here’s one explanation. Assume we want to sum the vector 1 0 2 0 0. We can do this in a very convoluted way by using a sum inner product with a vector of exponentials: [1⁴, 1³, 1², 1¹, 1⁰]:

      (1*4 3 2 1 0)+.×1 0 2 0 0
3

If we expand the exponentials to the left we get a vector of 1s. We can then break apart the inner product by turning +. to a +⌿ to the left:

      +⌿1 1 1 1 1×1 0 2 0 0
3

This is the textbook definition of 1⊥! Look:

      1⊥1 0 2 0 0
3

which, to be clear, is just the sum-reduce-first:

      +⌿1 0 2 0 0
3

Using 1⊥ to sum has two advantages over the more obvious formulation +⌿. Firstly, it’s easier to use in tacit formulations as it doesn’t require an operator, and secondly, it’s usually faster. The reasons for it being quicker is somewhat beyond the scope of this post, but it’s to do with 1⊥ making no guarantees about the ordering of operations, meaning that the interpreter is free to vectorise more efficiently.

Problem 2: Index-Of Modified

This problem wanted us to write a function that behaves like the APL Index Of function R←X⍳Y except that it should return 0 for elements of Y not found in X.

I wrote:

      p2 ← {0@((≢⍺)∘<)⊢⍺⍳⍵}
      2 3 p2 ⍳5
0 1 2 0 0

which is basically saying “change all instances of numbers greater than the length of the argument to zero”, which is how X⍳Y presents values that are not found.

Some very different solutions were submitted, for example:

      p2 ← ⍳|⍨1+≢⍤⊣
      2 3 p2 ⍳5
0 1 2 0 0

which is simply:

      p2 ← {(1+≢⍺)|⍺⍳⍵} ⍝ dfn of the above
      2 3 p2 ⍳5
0 1 2 0 0

Another option would have been to multiply ⍺⍳⍵ with ≢⍺, although no-one submitted exactly this:

      p2 ← ≢⍤⊣(≥×⊢)⍳
      2 3 p2 ⍳5
0 1 2 0 0

which could have been written explicitly as:

      p2 ← {m×(≢⍺)≥m←⍺⍳⍵} ⍝ dfn of the above
      2 3 p2 ⍳5
0 1 2 0 0

Problem 3: Multiplicity

Write a function that:

has a right argument Y which is an integer vector or scalar
has a left argument X which is also an integer vector or scalar
finds which elements of Y are multiples of each element of X and returns them as a vector (in the order of X) of vectors (in the order of Y).

Some test data was provided:

      X ← 2 4 7 3 9
      Y ← 5 7 8 1 12 10 20 16 11 4 2 15 3 18 14 19 13 9 17 6

I wrote something that, in retrospect, looks somewhat clumsy:

      p3 ← {⍵∘{⍵/⍺}¨↓0=⍺∘.|,⍵}
      X p3 Y
┌─────────────────────────┬────────────┬────┬──────────────┬────┐
│8 12 10 20 16 4 2 18 14 6│8 12 20 16 4│7 14│12 15 3 18 9 6│18 9│
└─────────────────────────┴────────────┴────┴──────────────┴────┘

which can be expressed more compactly as:

      p3 ← {/∘⍵¨↓0=⍺∘.|⍥,⍵}
      X p3 Y
┌─────────────────────────┬────────────┬────┬──────────────┬────┐
│8 12 10 20 16 4 2 18 14 6│8 12 20 16 4│7 14│12 15 3 18 9 6│18 9│
└─────────────────────────┴────────────┴────┴──────────────┴────┘

or:

      X (⊢⊂⍤/⍤1⍨0=∘.|⍥,) Y
┌─────────────────────────┬────────────┬────┬──────────────┬────┐
│8 12 10 20 16 4 2 18 14 6│8 12 20 16 4│7 14│12 15 3 18 9 6│18 9│
└─────────────────────────┴────────────┴────┴──────────────┴────┘

although no-one actually submitted that, to everyone’s credit.

Problem 4: Square Peg, Round Hole

Write a function that:

takes a right argument which is an array of positive numbers representing circle diameters
returns a numeric array of the same shape as the right argument representing the difference between the areas of the circles and the areas of the largest squares that can be inscribed within each circle.

I had to read that many times before it sank in. The key to achieve something snappy is to really work through the maths until it is as compact as possible, which, if you’re anything like me, you didn’t bother to do.

My attempt was:

      p4 ← {(○2*⍨⍵÷2)-2÷⍨⍵*2}

but there are much neater solutions if you did your homework. Here’s one that no-one found:

      p4 ← (○-+⍨)4÷⍨×⍨

and a nice explicit version:

      p4 ← {⍵×⍵×0.5-⍨○÷4}

which can be derived from this simplified mathematical expression, suggested by Rodrigo:

Explanation: The area of the circle is ○r*2, which is ○(⍵÷2)*2, in turn equivalent to ⍵×⍵×○÷4. The area of the square [ABCD] is twice the area of the triangle [ABC]. Given that the area of the triangle is 0.5×⍵×⍵÷2, the area of the square becomes 0.5×⍵×⍵. Putting both together, we get (⍵×⍵×○÷4)-⍵×⍵×0.5, the same as ⍵×⍵×(○÷4)-0.5, which is ⍵×⍵×0.5-⍨○÷4.

Square inside a circle with its diagonal as the circle's diameter

Problem 5: Rect-ify

For this problem, we were asked to plant a number of trees in a rectangular pattern with complete rows and columns, meaning all rows have the same number of trees. That rectangular pattern also needed to be as “square as possible”, meaning there is a minimal difference between the number of rows and columns in the pattern.

Here’s a smart solution, based on the observation that the “most square” choice must have one factor being the largest factor less than or equal to the square root:

      p5 ← {N,⍵÷1⌈N←⌈/0,⍵∨⍳⌊⍵*÷2}

This solution works well on large numbers of trees, too:

      p5 98776512304
280888 351658

Someone even offered up a recursive solution:

      p5rec ← {⍵=0:2⍴0 ⋄ ⍵ {0=⍵|⍺: ⍵,⍺÷⍵ ⋄ ⍺∇⍵-1} ⌊⍵*÷2}

So is one solution better than the other? Well, they both work correctly, but one is a lot faster than the other. Do you want to guess which was faster before we test it?

      'cmpx'⎕CY'dfns'
      cmpx 'p5 98776512304' 'p5rec 98776512304'
  p5 98776512304    → 8.7E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  p5rec 98776512304 → 6.1E¯3 | -94% ⎕⎕⎕

Surprised? I was! So, what is going on here? The non-recursive solution relies on a rather crude way to find the factors, which is a fairly large number to factorise even if it only needs to go up to the square root. The recursive version just tries each number in turn, up to the square root.

Can we be even smarter? This version was offered up by APL Orchard regular @rak1507:

      p5rak1507 ← {a,⍵÷1⌈a←⊃⌽⍸0=⍵|⍨⍳⌊⍵*.5} ⍝ @rak1507

      cmpx 'p5 98776512304' 'p5rec 98776512304' 'p5rak1507 98776512304'
  p5 98776512304        → 8.7E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  p5rec 98776512304     → 6.3E¯3 | -93% ⎕⎕⎕                                     
  p5rak1507 98776512304 → 4.2E¯3 | -96% ⎕⎕

Basically, (⊢∨⍳) is neat as a code-golf trick, but not great in terms of efficiency.

Problem 6: Fischer Random Chess

According to Wikipedia, Fischer random chess is a variation of the game of chess invented by former world chess champion Bobby Fischer. Fischer random chess employs the same board and pieces as standard chess, but the starting position of the non-pawn pieces on the players’ home ranks is randomised, following certain rules.

White’s non-pawn pieces are placed on the first rank according to the following rules:

the Bishops must be placed on opposite-colour squares
the King must be placed on a square between the rooks.

The task was to write a function that verifies that a given board placement is valid according to these rules.
This was my solution for this:

      q6 ← {(1=+/(⍵⍳'K')>⍸'R'=⍵)∧1=+/2|⍸'B'=⍵}

but there was a lot of variety in the solutions submitted to this problem. For example:

      q6i  ← {≠/2|⍸'B'=⍵}∧'RKR'≡∩∘'RK'        ⍝ Intersection

      q6ii ← {(≠/2|⍸'B'=⍵)∧1=(⍸'R'=⍵)⍸⍵⍳'K'}  ⍝ Interval index

      q6w  ← {(≠/2|⍸'B'=⍵)∧≠/(⍸'K'=⍵)<⍸'R'=⍵} ⍝ Where (similar to mine above)

The last version there is very amenable to Over:

            q6over ← {I←⍸=∘⍵ ⋄ (2|I'B')∧⍥(≠/)'K'<⍥I'R'}

And for masochists, there is always the famous Progressive Dyadic Iota:

      pd ← {((⍴⍺)⍴⍋⍋⍺,⍵)⍺⍺(⍴⍵)⍴⍋⍋⍵,⍺}
      q6pdi ← {(∧/2\</⍵⍳pd'RKR')∧≠/2|⍵⍳pd'BB'}

Problem 7: Can You Feel the Magic?

A square matrix is ‘magic’ if all of its rows and columns and both diagonals sum to the same number.

One hero came up with the following:

      q7 ← (∧/2=/∘∊+/,(+/1 2 2∘⍉))⍉,[0.5]⌽

Here is how it works:

      q7 magic←⎕←3 3⍴4 9 2 3 5 7 8 1 6
4 9 2
3 5 7
8 1 6
1
      q7 nonmagic←⎕←3 3⍴4 9 2 7 5 3 8 1 6
4 9 2
7 5 3
8 1 6
0

The problem statement suggested that dyadic transpose might come in handy, but that’s just showing off! So, how does it work? It’s certainly tacit:

              q7            ⍝ Ouch...
    ┌─────────┴─────────┐
  ┌─┴─┐               ┌─┼─────┐
  / ┌─┼──────┐        ⍉ [0.5] ⌽
┌─┘ 2 ∘    ┌─┼───┐    ┌─┘
∧    ┌┴┐   / , ┌─┴──┐ ,
     / ∊ ┌─┘   /    ∘
   ┌─┘   +   ┌─┘ ┌──┴──┐
   =         +   1 2 2 ⍉

The fork ⍉,[0.5]⌽ takes the argument matrix – a square array of rank-2, shape A A – and returns an array of rank-3, shape 2 A A, where the first cell is the transposed original array and the second is the original array with its rows reversed:

      ]display (⍉,[0.5]⌽) magic
┌┌→────┐
↓↓4 3 8│
││9 5 1│
││2 7 6│
││     │
││2 9 4│
││7 5 3│
││6 1 8│
└└~────┘

We only need to know about the main diagonal of each cell; as you can see, the main diagonal in the second cell is the reverse diagonal of the first cell. We can extract both diagonals with a single dyadic transpose:

      1 2 2⍉(⍉,[0.5]⌽) magic
4 5 6
2 5 8

The same result can be achieved using slightly less showy ⍤ instead, which has the same byte count but is a little easier to understand when first seen:

      1 1⍉⍤2(⍉,[0.5]⌽) magic ⍝ Diagonals of each major cell love ⍤
4 5 6
2 5 8

The remaining part of the tacit formulation untangles easily. Impressive and creative.

Here’s another good one that is slightly shorter:

      q7 ← (1=≢∘∪)⍉+⌿⍤,⍥(1 1∘⍉,⊢)⌽
      q7 magic
1
      q7 nonmagic
0

How does that work? The phrase 1 1∘⍉,⊢ prepends the diagonal as a column to the argument matrix:

      (1 1∘⍉,⊢)magic ⍝ Explicit: {(1 1⍉⍵),⍵}
4 4 9 2
5 3 5 7
6 8 1 6

Clever application of ⍥ says “take the matrix, append its reverse over the diagonal-append operation”:

      {⍵,⍥{(1 1⍉⍵),⍵}⌽⍵} magic ⍝ love ⍥
4 4 9 2 2 2 9 4
5 3 5 7 5 7 5 3
6 8 1 6 8 6 1 8

We can emphasise the location of the diagonals by using Partitioned Enclose ⊂ to make them stand out a bit:

      1 1 0 0 1 1 0 0 ⊂ {⍵,⍥{(1 1⍉⍵),⍵}⌽⍵} magic
┌─┬─────┬─┬─────┐
│4│4 9 2│2│2 9 4│
│5│3 5 7│5│7 5 3│
│6│8 1 6│8│6 1 8│
└─┴─────┴─┴─────┘

Summing along the leading axis gives:

      {+⌿⍵,⍥{(1 1⍉⍵),⍵}⌽⍵} magic
15 15 15 15 15 15 15 15

Finally, check that all items are equal:

      {1=≢∪+⌿⍵,⍥{(1 1⍉⍵),⍵}⌽⍵} magic ⍝ Length of vector of unique values = 1?
1

In summary, there are two things to note here: using ⍥ to get both diagonals and the use of 1=≢∘∪ to check that all items are equal. If you attended the APL Seeds ’21 conference last March, you’ll recognise this as one of the many ways of solving this problem that Conor Hoekstra presented – see https://dyalog.tv/APLSeeds21/?v=GZuZgCDql6g to watch his presentation.

Any solution that makes use of both of my favourite glyphs (⍤ and ⍥) is a winner in my book.

Problem 8: Time to Make a Difference

Write a function that:

has a right argument that is a numeric scalar or vector of length up to 3, representing a number of [[[days] hours] minutes] – a single number represents minutes, a 2-element vector represents hours and minutes, and a 3-element vector represents days, hours, and minutes
has a similar left argument, although not necessarily the same length as the right argument
returns a single number representing the magnitude of the difference between the arguments in minutes.

Here’s a cool version (several submissions were similar):

      p8 ← |-⍥(1 24 60⊥¯3∘↑)

Nothing too mysterious here. A slight complication is the need to handle a right argument that can be a scalar or a vector of length 2 or 3. The decode function ⊥ expects the argument vector to always be length 3, so we use the take function, dyadic ↑, with ¯3 as the left argument to ensure that the argument is always a vector of the correct length, padding from the left with zeros as required. The mixed radix vector 1 24 60 as the left argument to decode converts to minutes.

Problem 9: In the Long Run

Write a function that:

has a right argument that is a numeric vector of 2 or more elements representing daily prices of a stock
returns an integer singleton that represents the highest number of consecutive days where the price increased, decreased, or remained the same, relative to the previous day.

I’d like to compare and contrast two solutions, neither of which are tacit for a change:

      p9a ← {≢⍉↑⊂⍨1,2≠/×2-/⍵}
      p9b ← {⌈/¯2-/0,⍸1,⍨2≠/×2-/⍵} ⍝ Flat efficiency

Starting with the first of the two (p9a), from the right, we use a windowed difference reduction to calculate pairwise differences:

      2-/1 3 5 6 6 6 6 6 3 2 1
¯2 ¯2 ¯1 0 0 0 0 3 1 1

and then apply the direction function, monadic ×, to turn this into a vector of ¯1, 0 and 1 if the corresponding item is negative, zero or positive respectively:

      ×2-/1 3 5 6 6 6 6 6 3 2 1
¯1 ¯1 ¯1 0 0 0 0 1 1 1

Another pairwise windowed reduction, this time with ≠, gives us the points of change:

      2≠/×2-/1 3 5 6 6 6 6 6 3 2 1
0 0 1 0 0 0 1 0 0

Prepending a 1, this Boolean vector can be used as the left argument to partitioned enclose, ⊂; a common pattern. But what of the right argument? We can use the same vector as the right argument by using a clever commute, ⍨:

      ⊢m←⊂⍨1,2≠/×2-/1 3 5 6 6 6 6 6 3 2 1 ⍝ Commute to use the same argument left and right
┌─────┬───────┬─────┐
│1 0 0│1 0 0 0│1 0 0│
└─────┴───────┴─────┘

What remains is to find the longest cell in this vector. We could do ⌈/≢¨, but instead this submission found the length of the transpose-mix:

      ≢⍉↑ m
4

A code-golfer’s trick shot, perhaps, and somewhat dubious in terms of efficiency, but certainly cute. If you don’t see why it works, work it through right to left!

The second solution (p9b) uses a lot of the same ideas, but this time we add a 1 to the end of the points-of-change vector:

      {1,⍨2≠/×2-/⍵}1 3 5 6 6 6 6 6 3 2 1
0 0 1 0 0 0 1 0 0 1

and use where, monadic ⍸, to get the indices, prepending a 0 so that we can calculate the length of each segment:

      {0,⍸1,⍨2≠/×2-/⍵}1 3 5 6 6 6 6 6 3 2 1
0 3 7 10

The pairwise difference now represents the length of each segment, and by using a negative window we can commute each pair to get a positive number out for each pair:

      {¯2-/0,⍸1,⍨2≠/×2-/⍵}1 3 5 6 6 6 6 6 3 2 1
3 4 3

and so, for the maximum:

      {⌈/¯2-/0,⍸1,⍨2≠/×2-/⍵}1 3 5 6 6 6 6 6 3 2 1
4

Shall we race them? Of course!

      data ← 10000?10000
      cmpx 'p9a data'
  p9a data → 2.7E¯4 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  p9b data → 2.1E¯5 | -92% ⎕⎕⎕

The second version is faster for several reasons. We suspected already that the ‘cute’ way to find the longest vector in a nested vector was likely to be slow, as it has to create a huge matrix first, chasing pointers. The second version uses flat numeric vectors throughout, and cuts the work considerably by using where initially to do length calculations on the shorter vector of indices. Flat is fast.

Problem 10: On the Right Side

Write a function that:

has a right argument T that is a character scalar, vector or vector of character vectors/scalars
has a left argument W that is a positive integer specifying the width of the result
returns a right-aligned character array R of shape ((2=|≡T)/≢T),W meaning that R is one of the following:
- a W-wide vector if T is a simple vector or scalar
- a W-wide matrix with the same number rows as elements of T if T is a vector of vectors/scalars
if an element of T has length greater than W, truncate it after W characters.

The last point is perhaps a bit misleading, but the intention is clear from one of the examples given:

      ⍴⎕←8 (your_function) 'Longer Phrase' 'APL' 'Parade'
r Phrase
     APL
  Parade
3 8

In this case, “truncate after W characters” means “remove from the left”.
Conceptually, we need to (over)take W characters from the right of each element and mix that into a rank-2 array. To make it work for the edge cases, we should ensure that we can always treat the right argument as a vector of character vectors, using nest, monadic ⊆. This works because if we take more characters than the vector contains, it gets padded using a character-vector’s prototype element, a space.

            8 {↑(-⍺)↑¨⊆⍵} 'Longer Phrase' 'APL' 'Parade'
r Phrase
     APL
  Parade

An equivalent tacit formulation would be:

            8 (↑-⍤⊣↑¨⊆⍤⊢) 'Longer Phrase' 'APL' 'Parade'
r Phrase
     APL
  Parade

Here’s a slight variation:

            8 {⌽⍉⍺↑⍉↑⌽¨⊆⍵}'Longer Phrase' 'APL' 'Parade'
r Phrase
     APL
  Parade

This starts by reversing each cell, then applies a mix and transpose. We then take items from the left before backing out of the transpose and reverse by applying them again.
It can be done in a flatter manner, too:

            8{⍉(-⍺)↑⍉(⊆⍵)⌽∘↑⍨(⊢-⌈/)≢¨⊆⍵} 'Longer Phrase' 'APL' 'Parade'
r Phrase
     APL
  Parade

If we flip the selfie and add a few spaces it gets a bit easier to see what’s going on:

            8 {⍉(-⍺)↑⍉ ((⊢-⌈/)≢¨⊆⍵) ⌽↑⊆⍵} 'Longer Phrase' 'APL' 'Parade'
r Phrase
     APL
  Parade

From the right, we turn our input into a character array and then Rotate each row by its length minus the length of the longest row, which implements the right alignment:

            {((⊢-⌈/)≢¨⊆⍵) ⌽↑⊆⍵} 'Longer Phrase' 'APL' 'Parade'
Longer Phrase
          APL
       Parade

What remains is the truncation, which follows similar lines to the earlier versions.

For completeness we can race a couple of these variants. Let’s generate a chunkier data set first: 10,000 random strings of varying lengths up to 50:

      data←{⎕A[?(?50)⍴26]}¨⍳10000      'cmpx'⎕CY'dfns'
      cmpx '20{↑(-⍺)↑¨⊆⍵}data' '20{⍉(-⍺)↑⍉(⊆⍵)⌽∘↑⍨(⊢-⌈/)≢¨⊆⍵}data' '20{⌽⍉⍺↑⍉↑⌽¨⊆⍵}data'
  20{↑(-⍺)↑¨⊆⍵}data                 → 9.3E¯4 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕             
  20{⍉(-⍺)↑⍉(⊆⍵)⌽∘↑⍨(⊢-⌈/)≢¨⊆⍵}data → 9.0E¯4 |  -3% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕              
  20{⌽⍉⍺↑⍉↑⌽¨⊆⍵}data                → 1.4E¯3 | +46% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The complex, flat version wins, but not by a significant amount.

With that we’ve reached the end. A nice set of problems, with a lot of creative solutions submitted. Watch this space for a review of the Phase II problems…

Highlights of the 2020 Problem Solving Competition – Phase II

Posted on April 30, 2021 by Guest Blogger

With Dyalog’s APL Problem Solving Competition 2021 in full swing, it’s time to highlight some of the excellent solutions that were submitted to last year’s edition.

Stefan Kruger works for IBM making databases. While he tries to learn at least one new programming language a year, he got hooked on APL and participated in the competition. This is his perspective on some solutions that the judges picked out – call it the “Judges’ Pick”, if you like; smart, novel, or otherwise noteworthy solutions that can serve as an inspiration.

This blog post is also available as an interactive Jupyter Notebook document.

By Stefan Kruger

I’ll show a cool solution or two to each Phase II problem and dive into the details of a couple. If you need to refresh your memory with what the problems looked like, there’s a PDF of the Phase II problems.

Oh, and note that at the time of writing there is still plenty time to take part in the current edition of the competition (and really, who knew bowling was so complicated?) – there are some juicy cash prices to be won.

Problem 1: Take a Dive (1 task)

Level of Difficulty: Low

So let’s kick off with problem 1. The task was to calculate the score of an Olympic dive, consisting of a technical difficulty rating and a vector containing either 3, 5 or 7 judges’ scores. Only the central three ordered judges’ scores should be considered, which should be summed and multiplied by the technical difficulty rating.

Here is a cunning trick that wasn’t at all obvious:

∇ score←dd DiveScore scores;sorted;cenzored;rotator
  ⍝ 2020 APL Problem Solving Competition Phase II
  ⍝ Problem 1, Task 1 - DiveScore
   
  sorted←{⍵[⍋⍵]}scores
   
  ⍝  0 1 2 rotates score indexes to 123, 23451 or 3456712
  ⍝  So three center values always goes first
  ⍝  51 = (0 1 2∧.= 3 5 7 ∘.|⍳100) ⍳ 1
  rotator←51
 
  cenzored←3↑rotator⌽sorted
  score←⍎2⍕dd+.×cenzored
∇

      2.9 2.6 2.7 DiveScore¨(7 7.5 6.5 8 8 7.5 7)(9.5 8 8.5)(7.5 7 7 8.5 8)
63.8 67.6 60.75

This contestant figured out that if a vector of length 3, 5 or 7 is rotated 51 steps, then the original central three items will always end up at the beginning. No, really. It turns out that 51 is the first number X such that 0 1 2≡3 5 7|X. They tabulated the options and picked the first solution, guessing that it’d be less than 100:

      ⍸0 1 2∧.=3 5 7∘.|⍳100
51

But there is another way – this is one of those situations where the Chinese Remainder Theorem comes in handy, especially since it’s available on APLcart:

      3 5 7 {m|⍵+.×⍺(⊣×⊢|∘⊃{0=⍵:1 0 ⋄ (⍵∇⍵|⍺)+.×0 1,⍪1,-⌊⍺÷⍵})¨⍨⍺÷⍨m←×/⍺} 0 1 2 ⍝ https://aplcart.info?q=chinese
51

If you figured that out, award yourself a well-deserved pat on the back. For us mortals, we probably all did something rather more pedestrian:

DiveScore ← {
    d ← 2-2÷⍨7-≢⍵       ⍝ How many items should we drop each side?
    ⍺+.×(-d)↓d↓⍵[⍋⍵]
}

Problem 2 – Another Step in the Proper Direction (1 task)

Level of Difficulty: Medium

Problem 2 builds upon Problem 5 from Phase I. In short, we are asked to write a function Steps that takes a two-element vector to the right, defining a start and end value, and an optional left integer argument that tweaks how we generate values from start to end. The complexity here comes from the many combinations of behaviours from what exactly is given as the left argument: integer or float? positive or negative? Also, the range must be inclusive, even if a floating-point step size means that the end point is overshot. I took this on thinking it would be trivial – it wasn’t.

Here’s a great solution that manages to combine this functionality with a call to a single dfn:

∇ steps←{p}Steps fromTo;segments;width
  width ← |-/fromTo
  :If 0=⎕NC'p' ⍝ No left argument: same as Problem 5 of Phase I
      segments ← 0,⍳width
  :ElseIf p0 ⍝ p is the step size
      segments ← p {⍵⌊⍺×0,⍳⌈⍵÷⍺} width
  :ElseIf p=0 ⍝ As if we took zero step
      segments ← 0
  :EndIf
  ⍝ Take into account the start point and the direction.
  steps ← fromTo {(⊃⍺)+(-×-/⍺)×⍵} segments
∇

I ended up with something more convoluted, with a few ugly special cases, and shamelessly borrowing from dfns.iotag:

Steps ← {
    range ← {
        r ← ⍺-s×⎕IO-⍳⌊1-(⍺-⊃⍵)÷s←×/1↓⍵,(⍺>⊃⍵)/¯1 ⍝ "inspired" by dfns.iotag
        (⊃⍵)≠⊃⊖r: r,⊃⍵ ⋄ r   ⍝ Ensure endpoint is included – yeuch :(
    }
    ⍺ ← ⍬
    (b e) ← ⍵
    ⍺≡⍬: b range e        ⍝ No ⍺
    ⍺=0: b                ⍝ Zero step; return start point
    ⍺>0: b range e ⍺      ⍝ Positive ⍺
    len ← (e-b)÷count←⌊-⍺ ⍝ Negative ⍺
    len=0: b/⍨1+count     
    b range e len
}

Problem 3 – Past Tasks Blast (1 task)

Level of Difficulty: Medium

The task here was to scrape the Dyalog APL Problem Solving Competition webpage to extract all links to PDF files. We get the suggestion to use either Dyalog’s HttpCommand or shell out to a system mechanism for fetching a web page.

To use HttpCommand, we first need to load it:

      ]load HttpCommand
#.HttpCommand

Here’s a slightly tweaked competition submission, showing great flair in how to process XML:

PastTasks ← {
    url ← ⍵
    r ← (HttpCommand.Get url).Data  ⍝ get page contents
    (d n c a t) ← ↓⍉⎕XML r          ⍝ depth; name; content; attributes; type
    (k v) ← ↓⍉ ⊃⍪/ ((,'a')∘≡¨n)/a   ⍝ extract key-value pairs of <a> elements
    urls ← ('href'∘≡¨k)/v           ⍝ get URLs
    pdfs ← ('.pdf'∘≡¨¯4↑¨urls)/urls ⍝ filter .pdfs
    base ← ⊃⌽⊃('base'∘≡¨n)/a        ⍝ base URL
    base∘,¨pdfs
}

The problem statement suggests that a regex-based solution might be tolerable. Here’s a stab at that approach:

PastTasks ← {
    body ← (HttpCommand.Get ⍵).Data
    pdfs ← '<a href="(.+?\.pdf)"'⎕S'\1'⊢body
    base ← '<base href="(.+?)"'⎕S'\1'⊢body
    base,¨pdfs
}

So which is the “better” solution? Well, the first approach has a number of advantages: firstly, is much more robust (provided that the web page is valid XHTML, which we are told is a given), meaning that we can abdicate responsibility for dealing with markup quirks (single vs double quotes, whitespace etc) to the built-in ⎕XML system function, and secondly, there is that (in)famous quote from Jamie Zawinski:

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. – jwz

Mixing in a liberal helping of regular expressions in with APL is perhaps not helping APL’s unfair reputation for being write-only.

However, when dealing with patterns in textual data, as we unquestionably are here, regular expressions – even in a powerful language like APL – are sharp tools that are hard to beat, and any programmer worth their salt owes it to themselves to master them. In the case above, had the data not neatly been parseable as XML, it would have been more awkward to solve a problem like this relying only on APL primitives.

Problem 4 – Bioinformatics (2 tasks)

Level of Difficulty: Medium

The two tasks making up Problem 4 are borrowed from Project Rosalind, which is a Bioinformatics problem collection that often has great APL affinity:

and a hint that one benefits from understanding modular multiplication, as this isn’t built into Dyalog APL.

Here is a great example:

revp ← {                    ⍝ r ← revp dna
   dnaNum ← 'ACGT'⍳⍵        ⍝ Convert to 1..4 so that A+T = C+G = 5
   FindRevp ← {             ⍝ Given chunk size, extract positions and build the output format
       chunks ← ⍵,/dnaNum
       isRevp ← (⊢≡5-⌽)¨chunks
       ⍵,⍨⍪⍸isRevp
   }
   ⊃⍪/FindRevp¨4 6 8 10 12  ⍝ Test against all chunk sizes and collect results
}

sset ← {          ⍝ r←sset n
   bin ← 2⊥⍣¯1⊢⍵  ⍝ Binary digits
   arr ← ⌽2*bin   ⍝ Repeated squaring: Starting from MSB and 1, square ⍵, multiply ⍺, modulo m
   mod ← 1000000
   {mod|⍺×⍵*2}/arr,1
}

This contestant also saw fit to include their test suite; a nice touch! Roger Hui’s version of assert has become the de facto standard, and the contestant puts it to good use:

Assert ← {⍺←'assertion failure' ⋄ 0∊⍵:⍺ ⎕SIGNAL 8 ⋄ shy←0} ⍝ Roger Hui's Assert

RevpTest ← {
   s ← 'TCAATGCATGCGGGTCTATATGCAT'
   ans ← revp s
   Assert 8 2≡⍴ans:
   Assert 5 4 7 4 17 4 18 4 21 4 4 6 6 6 20 6≡∊ans:
  
   header ← 'Contest2020/Data/'  ⍝ Change as needed
   data1 ← ∊1↓⊃⎕NGET (header,'rosalind_revp_1_dataset.txt') 1
   ans1 ← ↑⍎¨⊃⎕NGET (header,'rosalind_revp_1_output.txt') 1
   data2 ← ∊1↓⊃⎕NGET (header,'rosalind_revp_2_dataset.txt') 1
   ans2 ← ↑⍎¨⊃⎕NGET (header,'rosalind_revp_2_output.txt') 1
   Assert ans1 ≡ revp data1:
   Assert ans2 ≡ revp data2:
   'Test passed'
}

SsetTest ← {
   Assert 8 = sset 3:
   Assert 551872 = sset 857:
   Assert 935424 = sset 870:
   'Test passed'
}

Problem 5 – Future and Present Value (2 tasks)

Level of Difficulty: Medium

Problem 5 is some hedge fund maths, or something where my eyes glazed over before I fully understood the ask. What is this, K‽

This solution is impressively compact – I removed the comments to highlight the APL artistry on display: no less than three scans, count ’em!

rr ← {AR×+\⍺÷AR←×\1+⍵} 
pv ← {+/⍺÷×\1+⍵}

Here’s how the competitor outlined how their solution works:

This can be calculated elegantly with the following operations:

Find the accumulated interest rate (AR) for each term (AR←×\1+⍵).
Deprecate the cashflow amounts by dividing them by AR. This finds the present value of all the amounts.
Accumulate all the present values of the amounts to find the total present value at each term.
Multiply by AR to find future values at each term.

This way the money that was invested or withdrawn in a term is not changed for that term, but the money that came from the previous terms is multiplied by the current interest rate for each term arriving to the correct recurrent relation:

Step 2)	`amounts[i]/AR[i] ⍝ ≡ PV[i]`
Step 3)	`amounts[i]/AR[i] + APV[i-1]`
Step 4)	`amounts[i] + APV[i-1]×AR[i]` `amounts[i] + APV[i-1]×AR[i-1]×(1+rate[i])` `amounts[i] + r[i-1]×(1+rate[i]) ⍝ ≡ r[i]`

Problem 6 – Merge (1 task)

Level of Difficulty: Medium

Mail merge – gotta love it. Your spam folder is full of bad examples of this: “Dear $FIRSTNAME, do you want to purchase a bridge?” We’re given a template file with patterns such as @firstname@ which are to be replaced with values stored in a JSON file. Here’s a smart approach from a competitor who knows their way around the @ operator:

Merge ← {
   templateFile ← ⍺
   jsonFile ← ⍵
   template ← ⊃⎕NGET templateFile
   ns ← ⎕JSON⊃⎕NGET jsonFile

   getValue ← {
       0=⍴⍵:,'@'   ⍝ '@@'         → ,'@'
       6::'???'    ⍝ ~⍵∊ns.⎕NL ¯2 → '???'
       ⍕ns⍎⍵       ⍝  ⍵∊ns.⎕NL ¯2 → ⍕ns.⍵
   }
   ∊getValue¨@(⍴⍴1 0⍨)'@'(1↓¨=⊂⊢)template
}

The key insight here is that since each template starts and ends with the same marker, we can partition the data on sections beginning with @ and then we’ll have a vector where every other element is a template to be substituted. Here’s an example of this:

      ↑('@'(1↓¨=⊂⊢) '@title@ @firstname@ @lastname@, would you be interested in the Brooklyn Bridge?') (1 0 1 0 1 0)
┌─────┬─┬─────────┬─┬────────┬─────────────────────────────────────────────────┐
│title│ │firstname│ │lastname│, would you be interested in the Brooklyn Bridge?│
├─────┼─┼─────────┼─┼────────┼─────────────────────────────────────────────────┤
│1    │0│1        │0│1       │0                                                │
└─────┴─┴─────────┴─┴────────┴─────────────────────────────────────────────────┘

I added the second row for clarity to show the alternating templates. Cool, huh? However, this only works correctly if the data leads with a template. Consider:

      '@'(1↓¨=⊂⊢) 'Dear @firstname@ @lastname@, or maybe the Golden Gate?'
┌─────────┬─┬────────┬───────────────────────────┐
│firstname│ │lastname│, or maybe the Golden Gate?│
└─────────┴─┴────────┴───────────────────────────┘

We still have the alternating templates, but the prefix (Dear ) is lost. We can tweak the Merge function a bit to cater for this if we need to:

Merge ← {
    templateFile ← ⍺
    jsonFile ← ⍵
    template ← ⊃⎕NGET templateFile
    ns ← ⎕JSON⊃⎕NGET jsonFile
    first ← templ⍳'@'
    first>≢templ: templ    ⍝ No templates at all
    prefix ← first↑templ   ⍝ Anything preceding the first '@'?

    getValue ← {
        0=⍴⍵:,'@'   ⍝ '@@'         → ,'@'
        6::'???'    ⍝ ~⍵∊ns.⎕NL ¯2 → '???'
        ⍕ns⍎⍵       ⍝  ⍵∊ns.⎕NL ¯2 → ⍕ns.⍵
    }
    ∊prefix,getValue¨@(⍴⍴1 0⍨)'@'(1↓¨=⊂⊢)template
}

Now, the competition is pitched such that “proper array solutions” are preferred – and for good reasons, most of the time. However, it’s hard to overlook some industrial regex action in this case. Strictly for Perl-fans:

Merge ← {
    mrg ← ⎕JSON⊃⎕NGET ⍵
    keys ← mrg.⎕NL¯2
    vals ← mrg.⍎¨keys

    ('@',¨(keys,'' '[^@]+'),¨'@')⎕R((⍕¨vals),'@' '???')⊃⎕NGET ⍺
}

Problem 7 – UPC (3 tasks)

Level of Difficulty: Medium

Problem 7 had us learning more about bar codes than we ever thought necessary. Read them, write them, verify them, scan them – forwards and backwards no less. Good scope for stretching your array muscles on this one. The eagle-eyed amongst you may have spotted that the verification aspect is a simplified version of Luhn’s algorithm, which a certain Morten Kromberg used to illustrate APL’s array capabilities at JIO a while back.

Here’s a good solution:

CheckDigit ← (10|∘-+.×∘(11⍴3 1))          ⍝ Computes the check digit for a UPC-A barcode.

UPCRD ← 114 102 108 66 92 78 80 68 72 116 ⍝ Right digits of a UPC-A barcode, base 10.
bUPCRD ← ⍉2∘⊥⍣¯1⊢UPCRD                    ⍝ Bit matrix with one right digit per row.

WriteUPC ← {
   ⍝ Writes the bits of a UPC-A barcode.  
   ~((11∘=≢)∧(∧/0∘≤∧≤∘9))⍵: ¯1            ⍝ Check for simple errors
   b ← bUPCRD[⍵,CheckDigit ⍵;]  
   1 0 1, (,~6↑b), 0 1 0 1 0, (,6↓b), 1 0 1 
}

ReadUPC ← {
   ⍝ Reads a UPC-A barcode into its digits.
   ~(∧/0∘≤∧≤∘1)⍵: ¯1                 ⍝ Input isn't a bit vector
   95≠≢⍵: ¯1                         ⍝ Number of bits must be 95
   (b l m r e) ← ⍵ ⊂⍨ (∊¯1∘↓,⌽) (3↑1)(42↑1)(5↑1)
   
   b ∨⍥(≢∘1 0 1) e: ¯1               ⍝ Wrong patterns for the guards
   m≢0 1 0 1 0: ¯1
   bits ← ↓12 7⍴ l,r
   C ← (↓bUPCRD)∘⍳ ~@(⍳6)            ⍝ Convert bits to digits
   tf ← ~∧/10 > nums ← C bits        ⍝ Should we try flipping the bits?
   nums ← (nums×1-tf) + tf×C⌽↓⌽↑bits
   ∨/10=nums: ¯1                     ⍝ Bits simply aren't right
   (¯1↑nums)≠CheckDigit 11↑nums: ¯1  ⍝ Bad check digit
   nums
}

Problem 8 – Balancing the Scales (1 task)

Level of Difficulty: Hard

Our task is to partition a set of numbers into two groups of equal sum if this is possible, or return ⍬ if not. This is a well-known NP-complete problem called The Partition Problem and, as such, has no polynomial time exact solutions. The problem statement indicates that we only need to consider a set of 20 numbers or fewer, which is a bit of a hint on what kind of solution is expected.

This problem, in common with many other NP problems, also has a plethora of interesting heuristic solutions: polynomial algorithms that whilst not guaranteed to always find the optimal solution will either get close, or be correct for a significant subset of the problem domain in a fraction of the time the exact algorithms would take.

However, it’s clear that Dyalog expects us to give an exact solution, and has given us an upper bound on the input data length. Finally, we’re offered the cryptic advice that

Understanding the nuances of the problem is the key to developing a good algorithm.

Yes, thank you, master Yoda.

Here’s a great, efficient solution:

Balance←{
   sum←1⊥⍵
   2|sum: ⍬   ⍝ Lists with an odd sum cannot be split into equal parts.
   halfsum←sum÷2
  
   ⍝ A partitioning method based on the algorithm by Horowitz and Sahni.
   ⍝ The basic idea of the algorithm is to split the input into two parts,
   ⍝ and then generate all subset sums for these parts. Then the problem
   ⍝ becomes finding a sum of two subset sums from different parts
   ⍝ equal to the desired value. Instead of sorting the sums and comparing
   ⍝ them like in the original algorithm, standard APL searching primitives
   ⍝ ∊ and ⍳ are used. Another key idea is to generate the subset sums
   ⍝ in a specific order, so that the nth subset sum in the vectors a and b
   ⍝ is the sum of the elements chosen by the binary representation of n.
   ⍝ This means that we can get the elements of the solution sum
   ⍝ without having to generate anything but the sums.
   horowitzsahni←{
       s←⍵(↑{⍺⍵}↓)⍨⌊2÷⍨≢⍵                          ⍝ Split the input.
       a b←⊃¨(⊢,+)/¨s,¨0                           ⍝ Generate the subset sums.
       indexes←a {(⊢,⍵⍳⍺⌷⍨(≢⍺)⌊⊢)1⍳⍨⍺∊⍵} halfsum-b ⍝ Search for solution indexes.
       indexes[2]>≢b: ⍬
       ⍵ {(⍺/⍨~⍵)(⍵/⍺)} ∊(2⍴¨⍨≢¨s)⊤¨indexes-1      ⍝ Get the solution from the indexes.
   }
  
   ⍝ A simple exhaustive search. It uses the same binary representation
   ⍝ idea as the horowitzsahni function.
   exhaustive←{
       i←halfsum⍳⍨⊃(⊢,+)/⍵,0
       i>2*≢⍵: ⍬
       ⍵ {(⍺/⍨~⍵)(⍵/⍺)} (2⍴⍨≢⍵)⊤i-1
   }

   ⍝ The exhaustive method performs better than the Horowitz-Sahni method
   ⍝ for small input sizes. 14 seems to be a reasonable cutoff point.
   14>≢⍵: exhaustive ⍵
   horowitzsahni ⍵
}

There are a number of clever touches here – there are actually two different solutions, an exhaustive search and an implementation of the algorithm due to Horowitz and Sahni, which, although still exponential, is known to be one of the fastest for certain subsets and input sizes. A switch based on input size checks for the crossover point and chooses the fastest option. And this is fast – five times faster than that of the Grand Prize winner, and four orders of magnitude faster than the slowest solution.

Such a performance spread is intriguing, so there are clearly lessons to be learned here. When I tried this problem, I ended up with a pretty straight-forward (a.k.a. naive) brute force search:

Balance ← {⎕IO←0
    total ← +/⍵
    2|total: ⍬             ⍝ Sum must be divisible by 2
    psum ← total÷2         ⍝ Our target partition sum
    bitp ← ⍉2∘⊥⍣¯1⍳2*≢⍵    ⍝ All possible bit patterns up to ≢⍵
    idx ← ⍸<\psum=bitp+.×⍵ ⍝ First index of partition sum = target
    ⍬≡idx: ⍬               ⍝ If we have no 1s, there is no solution
    part ← idx⌷bitp        ⍝ Partition corresponding to solution index
    (part/⍵)(⍵/⍨~part)     ⍝ Compress input by solution pattern and inverse
}

If you come to APL from a scalar language, that approach must seem incredibly wasteful: make all bit patterns. Try all sums. Search for the right one, if it exists. But as it turns out, this is APL home turf advantage. Let’s try to demonstrate this point. If you did this “loop and branch”, you’d iterate over the bit patterns and stop once you find the first solution – in fact, for the test data in the problem specification, the first solution appears at around the 1500^th bit pattern if you generate them as I do above. The vector version would need to consider the whole space of around

      ¯1+2*20
1048575

a million or so, so quite a difference. Surely, in this case the scalar approach should be way faster? Only one way to find out. We can make a scalar version in several ways – here’s the “Scheme” version:

BalanceScalar ← {⎕IO←0     ⍝ Warning: this is not the APL Way, as we shall see.
    total ← +/⍵
    2|total: ⍬             ⍝ Sum must be divisible by 2
    psum ← total÷2         ⍝ Our target partition sum
    data ← ⍵
    bitp ← ↓⍉2∘⊥⍣¯1⍳2*≢⍵   ⍝ Pre-compute the bit patterns
    {                      ⍝ Try one sum after the other, halt on first solution
        0=⍵: ⍬
        patt ← ⍵⊃bitp
        psum=patt+.×data: (patt/data)(data/⍨~patt) ⍝ Exit on first solution found
        ∇¯1+⍵
    } ¯1+≢bitp
}

Dyalog’s got game when it comes to tail call optimisation, right? OK, let’s race:

      'cmpx'⎕CY'dfns'
      d ← 10 81 98 27 28 5 1 46 63 99 25 39 84 87 76 85 78 64 41 93
      cmpx 'Balance d' 'BalanceScalar d'
  Balance d       → 2.7E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕            
* BalanceScalar d → 3.9E¯2 | +43% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Vectorisation, Boolean vectors and primitive functions wins the day. We didn’t go completely scalar, to be fair, as we still pre-computed all the binary patterns.

But back to the task at hand – let’s pit ourselves against the intellectual might of Horowitz and Sahni:

horowitzsahni←{
    sum←1⊥⍵
    2|sum: ⍬   ⍝ Lists with an odd sum cannot be split into equal parts.
    halfsum←sum÷2
    s←⍵(↑{⍺⍵}↓)⍨⌊2÷⍨≢⍵                          ⍝ Split the input.
    a b←⊃¨(⊢,+)/¨s,¨0                           ⍝ Generate the subset sums.
    indexes←a {(⊢,⍵⍳⍺⌷⍨(≢⍺)⌊⊢)1⍳⍨⍺∊⍵} halfsum-b ⍝ Search for solution indexes.
    indexes[2]>≢b: ⍬
    ⍵ {(⍺/⍨~⍵)(⍵/⍺)} ∊(2⍴¨⍨≢¨s)⊤¨indexes-1      ⍝ Get the solution from the indexes.
}

      cmpx 'horowitzsahni d' 'Balance d' 'BalanceScalar d'
  horowitzsahni d → 4.7E¯5 |      0%                                         
* Balance d       → 2.8E¯2 | +59266% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕            
  BalanceScalar d → 4.0E¯2 | +84466% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Ouch! Well, told you my exhaustive search was naive. An impressive performance from the competitor – but also an impressive performance from Dyalog APL – even my knocked up exhaustive search runs in a pretty decent 25–30ms or so, about half the time of my shoddy Python attempt (although out-speeding Python is a low bar). I’m keeping the above implementation of Horowitz/Sahni handy for next edition of Advent of Code, where this problem always seems to crop up in some shape or form.

Problem 9 – Upwardly Mobile (1 task)

Level of Difficulty: Hard

And so for the final question. We were offered strong hints that a neat array-oriented solution might not be possible, but that the judges were prepared to be proven wrong.

Here’s a nicely compact, recursive solution:

∇ weights ← Weights filename;diag;FindWeights;start
    diag ← ↑(≠∘(⎕UCS 10)⊆⊢)⊃⎕NGET filename
    FindWeights ← {
        '┌┐│'∊⍨⊃⍵: ∇1↓⍵                    ⍝ if on any of these, go down        
        ⎕A∊⍨⊃⍵: ⎕A=⊃⍵                      ⍝ if on a letter, give weights
        r_disp ← '┐'⍳⍨0⌷⍵                  ⍝ otherwise, (i.e. on '┴'), find the displacement of right branch,
        l_disp ← -1+'┌'⍳⍨⌽0⌷⍵              ⍝ ...and the left branch
        wts ← ↑(∇r_disp⌽⍵)(∇l_disp⌽⍵)      ⍝ recurse,
        +⌿wts×[0]⌽(+/wts)×r_disp (-l_disp) ⍝ ...and calculate new weights
    }
    start ← diag⌽⍨⍸'┴│'∊⍨0⌷diag            ⍝ starting position attained by ⌽'ing to '┴' or '│'
    weights ← (~∘0÷∨/)FindWeights start    ⍝ remove 0s and get lowest weights
∇

Finally, someone took the suggestion that an array-based solution might not be possible as a personal challenge and produced the following:

Weights ← {
    m  ← ↑(⎕UCS 10)(≠⊆⊢)⊃⎕NGET ⍵ ⍝ no empty lines midway through so this is fine
    fm ← m='┴'               ⍝ fulcrum mask
    ER ← {+\1-⍵\¯2-⌿0⍪⍸⍵}    ⍝ distance to closest 1 to the left
      
    wa ← +/,m∊⎕A             ⍝ weight amount
    wi ← (⍳wa)@{⍵} m∊⎕A      ⍝ weight indexes
    fa ← +/,fm               ⍝ fulcrum amount
    fir← wa + ⍳fa            ⍝ fulcrum indexes (reduced)
    fi ← fir@{⍵} fm          ⍝ fulcrum indexes
    ai ← fi+wi               ⍝ all indexes
    ai+← ⍉(m∊'┌┐') {⍺\⍵/⍨⍵≠0}⍤1⍥⍉ 0@1⊢ai ⍝ extend indexes upwards to the ┌┐s that need them (exclude top ┴ as it isn't matched)
      
    ld ←  ER⍤1⊢ m='┌'        ⍝ distance to left
    rd ← ⌽ER⍤1⌽ m='┐'        ⍝ distance to right
    xp ← (⍴m)⍴⍳2⊃⍴m          ⍝ x position
    fml← ↓fm                 ⍝ fulcrum mask & its lines
    ail← ↓ai                 ⍝ all index lines
    GET← {⊃,/ail⌷⍨∘⊂¨fml/¨⍵} ⍝ get an item of ai for each fulcrum at x position ⍵
    lir← GET ↓xp-ld          ⍝ left indexes (reduced)
    rir← GET ↓xp+rd          ⍝ right indexes (reduced)
    ldr← fm /⍥, ld           ⍝ left distance (reduced)
    rdr← fm /⍥, rd           ⍝ right distance (reduced)
      
    in ← ↑⊃{(+/⍵[⍺])@(⊃⍺)⊢ ⍵}/ (↓⍉↑fir lir rir) , ⊂↓(⍳fa+wa)∘.=⍳wa ⍝ included weights for each index
    cf ← (ldr ×⍤¯1⊢ in[lir;]) - rdr ×⍤¯1⊢ in[rir;] ⍝ coefficients
    ws ← (1,(≢cf)⍴0) ⌹ ((2⊃⍴cf)↑1)⍪cf              ⍝ unscaled weights
    (⊢÷∨/) ws                                      ⍝ scale weights to integers
}

I take my hat off in admiration of the audacity: “An array solution might not be possible, eh? Hold my beer.”

So there we have it, a smörgåsbord of clever solutions to serve as an inspiration for us all. The 2020 edition of the competition sported a slightly simplified format where you were expected to tackle every problem instead of the approach in previous years where you had to make a subset selection from themed groups – this new approach remains for the current (2021) edition.

You are taking part, aren’t you?

Or is it Minding Boggle Performance?

Disclaimer

Goals

Attacking the Problem

Setup

Things Run Faster If You Do Less Work

Does mwords Make Much of a Difference?

Takeaways

Postscript

Inspiration

Technical Details

Concluding the Series

Problem 1: Are You a Bacteria?

Problem 2: Index-Of Modified

Problem 3: Multiplicity

Problem 4: Square Peg, Round Hole

Problem 5: Rect-ify

Problem 6: Fischer Random Chess

Problem 7: Can You Feel the Magic?

Problem 8: Time to Make a Difference

Problem 9: In the Long Run

Problem 10: On the Right Side

Problem 1: Take a Dive (1 task)

Problem 2 – Another Step in the Proper Direction (1 task)

Problem 3 – Past Tasks Blast (1 task)

Problem 4 – Bioinformatics (2 tasks)

Problem 5 – Future and Present Value (2 tasks)

Problem 6 – Merge (1 task)

Problem 7 – UPC (3 tasks)

Problem 8 – Balancing the Scales (1 task)

Problem 9 – Upwardly Mobile (1 task)

QUICK LINKS

Does `mwords` Make Much of a Difference?