Mind Boggling Performance

Or is it Minding Boggle Performance?

Better late than never? This was a blog post I started to write during COVID-19 and now I’ve finally gotten around to finishing it.

In the 2019 APL Problem Solving Competition, we presented a problem to solve the Boggle game . In Boggle, a player tries to make as many words as possible from contiguous letters in a 4×4 grid, with the stipulation that you cannot reuse a position on the board.

Rich Park’s webinar from 17 October 2019 presents, among other things, a very good discussion and comparison of two interesting solutions submitted by Rasmus Précenth and Torsten Grust. As part of that discussion, Rich explores the performance of their solutions. After seeing that webinar, I was curious about how my solution might perform.

Disclaimer

Please take note that performance was not mentioned as one of the criteria for this problem other than the implicit expectation that code completes in a reasonable amount of time. As such, this post is in no way intended to criticize anyone’s solutions – in fact, in many cases I’m impressed by the elegance of the solutions and their application of array-oriented thinking. I have no doubt that had we made performance a primary judging criterion, people would have taken it into consideration and possibly produced somewhat different code.

Goals

I started writing APL in 1975 at the age of 14 and “grew up” in the days of mainframe APL when CPU cycles and memory were precious commodities. This made me develop an eye towards writing efficient code. In developing my solution and writing this post, I had a few goals in mind:

  • Use straightforward algorithm optimizations and not leverage or avoid any specific features in the interpreter. Having a bit of understanding about how APL stores its data helps though.
  • Illustrate some approaches to optimization that may be generally applicable.
  • Encourage discussion and your participation. I don’t present my solution as the paragon of performance. I’m sure there are further optimizations that can be made and hope you’ll (gently) suggest some.

The task was to write a function called FindWords that has the syntax:

      found←words FindWords board

where:

  • words is a vector of words. We used Collins Scrabble Words, a ≈280,000-word word list used by tournament Scrabble™ players. We store this in a variable called AllWords. Note that single letter words like “a” and “I” are not legitimate Scrabble words.
  • board is a matrix where each cell contains one or more letters. A standard Boggle board is 4×4.
  • the result, found is a vector that is a subset of words containing the words that can be made from board without revisiting any cells.

Although the actual Boggle game uses only words of 3 letters or more, for this problem we permit words of 2 or more letters.

Here’s an example of a 2×2 board:

     AllWords FindWords ⎕← b2← 2 2⍴'th' 'r' 'ou' 'gh'
┌──┬──┐
│th│r │
├──┼──┤
│ou│gh│
└──┴──┘
┌──┬───┬────┬─────┬─────┬──────┬───────┐
│ou│our│thou│rough│routh│though│through│
└──┴───┴────┴─────┴─────┴──────┴───────┘

First, let’s define some variables that we’ll use in our exploration:

      b4← 4 4⍴ 't' 'p' 'qu' 'a' 's' 'l' 'g' 'i' 'r' 'u' 't' 'e' 'i' 'i' 'n' 'a' ⍝ 4×4 board
      b6← 6 6⍴'jbcdcmvueglxriybgeiganuylvonxkfeoqld' ⍝ 6×6 board

If you’re using Dyalog v20.0 or later, you can represent this using array notation:

⍝ using array notation with single-line input:
      b4←['t' 'p' 'qu' 'a' ⋄ 's' 'l' 'g' 'i' ⋄ 'r' 'u' 't' 'e' ⋄ 'i' 'i' 'n' 'a']
      b6←['jbcdcm' ⋄ 'vueglx' ⋄ 'riybge' ⋄ 'iganuy' ⋄ 'lvonxk' ⋄ 'feoqld']

⍝ or, using array notation with multi-line input:
      b4←['t' 'p' 'qu' 'a'
          'slgi'
          'rute'
          'iina']

      b6←['jbcdcm'
          'vueglx'
          'riybge'
          'iganuy'
          'lvonxk'
          'feoqld']

The representation does not affect the performance or the result:

      b4 b6
┌──────────┬──────┐
│┌─┬─┬──┬─┐│jbcdcm│
││t│p│qu│a││vueglx│
│├─┼─┼──┼─┤│riybge│
││s│l│g │i││iganuy│
│├─┼─┼──┼─┤│lvonxk│
││r│u│t │e││feoqld│
│├─┼─┼──┼─┤│      │
││i│i│n │a││      │
│└─┴─┴──┴─┘│      │
└──────────┴──────┘

There were 9 correct solutions submitted for this problem. We’ll call them f1 through f9 – my solution is f0. Now let’s run some comparative timings using cmpx from the dfns workspace. cmpx will note whether the result of any of the latter expressions returns a different result from the first expression. We take the tally () of the resulting word lists to make sure the expressions all return the same result. We assume that the sets of words are the same if the counts are the same. These timings were done using Dyalog v20.0 with a maximum workspace (MAXWS) of 1GB running under Windows 11 Pro. To keep the expressions brief I bound AllWords as the left argument to each of the solution functions:

      f0←≢AllWords∘#.Brian.Problems.FindWords

To make it easier to run timings, I wrote a simple function to call cmpx with the solutions of my choosing (the default is all solutions).

      )copy dfns cmpx
      time←{⍺←¯1+⍳10 ⋄ cmpx('f',⍕,' ',⍵⍨)¨⍺}

This allows me to compare any 2 or more solutions, or by default, all solutions on a given board variable name.

      time 'b4' ⍝ try a "standard" 4×4 Boggle board
  f0 b4 → 1.2E¯2 |      0%
  f1 b4 → 2.3E¯1 |  +1791% ⎕⎕
  f2 b4 → 4.3E¯1 |  +3491% ⎕⎕⎕
  f3 b4 → 7.3E¯1 |  +5941% ⎕⎕⎕⎕⎕                                    
  f4 b4 → 5.6E0  | +46600% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
  f5 b4 → 1.1E0  |  +8758% ⎕⎕⎕⎕⎕⎕⎕⎕                                 
  f6 b4 → 8.9E¯1 |  +7325% ⎕⎕⎕⎕⎕⎕                                   
  f7 b4 → 1.1E0  |  +9275% ⎕⎕⎕⎕⎕⎕⎕⎕                                 
  f8 b4 → 4.2E0  | +34750% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕           
  f9 b4 → 2.0E0  | +16291% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕                     

If we try to run on the 6×6 sample

      0 1 2 3 4 5 6 7 9 time'b6'
  f0 b6 → 2.2E¯2 |       0%                                          
  f1 b6 → 3.7E¯1 |   +1577%                                          
  f2 b6 → 1.0E0  |   +4581%                                          
  f3 b6 → 1.5E0  |   +6872%                                          
  f4 b6 → 1.6E2  | +705950% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
  f5 b6 → 2.3E0  |  +10409% ⎕                                        
  f6 b6 → 1.9E0  |   +8463%                                          
  f7 b6 → 2.0E0  |   +9131% ⎕
  f9 b6 → 2.4E0  |  +10822% ⎕

f8 is excluded as it would cause a WS FULL in my 1GB workspace.

Why is f0 about 16-18 times faster than the next fastest solution, f1? I didn’t set out to make FindWords fast, it just turned out that way. Let’s take a look at the code…

     ∇ r←words FindWords board;inds;neighbors;paths;stubs;nextcells;mwords;mask;next;found;lens;n;m;map
[1]    inds←⍳⍴board                                      ⍝ board indices
[2]    neighbors←(,inds)∘∩¨↓inds∘.+(,¯2+⍳3 3)~⊂0 0       ⍝ matrix of neighbors for each cell
[3]    paths←⊂¨,inds                                     ⍝ initial paths
[4]    stubs←,¨,board                                    ⍝ initial stubs of words
[5]    nextcells←neighbors∘{⊂¨(⊃⍺[¯1↑⍵])~⍵}              ⍝ append unused neighbors to path
[6]    mwords←⍉↑words                                    ⍝ matrix of candidate words, use a columnar matrix for faster ∧.=
[7]    mask←mwords[1;]∊⊃¨,board                          ⍝ mark only those beginning with a letter on the board
[8]    mask←mask\∧⌿(mask/mwords)∊' ',∊board              ⍝ further mark only words containing only letters found on the board
[9]    words/⍨←mask                                      ⍝ keep those words
[10]   mwords/⍨←mask                                     ⍝ keep them in the matrix form as well
[11]   r←words∩stubs                                     ⍝ seed result with any words that may already be formed from single cell
[12]   :While (0∊⍴paths)⍱0∊⍴words                        ⍝ while we have both paths to follow and words to look at
[13]       next←nextcells¨paths                          ⍝ get the next cells for each path
[14]       paths←⊃,/(⊂¨paths),¨¨next                     ⍝ append the next cells to each path
[15]       stubs←⊃,/stubs{⍺∘,¨board[⍵]}¨next             ⍝ append the next letters to each stub
[16]       r,←words∩stubs                                ⍝ add any matching words
[17]       mask←(≢words)⍴0                               ⍝ build a mask to remove word beginnings that don't match any stubs
[18]       found←(≢stubs)⍴0                              ⍝ build a mask to remove stubs that no words begin with
[19]       lens←≢¨stubs                                  ⍝ length of each stub
[20]       :For n :In ∪lens                              ⍝ for each unique stub length
[21]           m←n=lens                                  ⍝ mark stubs of this length
[22]           map←(↑m/stubs)∧.=n↑mwords                 ⍝ map which stubs match which word beginnings
[23]           mask∨←∨⌿map                               ⍝ words that match
[24]           found[(∨/map)/⍸m]←1                       ⍝ stubs that match
[25]       :EndFor
[26]       paths/⍨←found                                 ⍝ keep paths that match
[27]       stubs/⍨←found                                 ⍝ keep stubs that match
[28]       words/⍨←mask                                  ⍝ keep words that may yet match
[29]       mwords/⍨←mask                                 ⍝ keep matrix words that may yet match
[30]   :EndWhile
[31]   r←∪r
     ∇

Attacking the Problem

Intuitively, this felt like an iterative problem. A mostly-array-oriented solution might be to generate character vectors made up from the contents of all paths in board and then do a set intersection with words. But that would be horrifically inefficient – there are over 12-million paths in a 4×4 matrix and, in the case of b4, there are only 188 valid words. What about a recursive solution (many of the submissions used recursion)? I tend to avoid recursion unless there are clear advantages to using it, and in this case I didn’t see any advantages, clear or otherwise. So, iteration it was…

I decided to use two parallel structures to keep track of progress:

  • paths – the paths traversed through the board
  • stubs – the word “stubs” built from the contents of the cells in paths

paths is initialized to the indices of the board, and stubs is initialized to the contents of each cell. Then iterate:

  1. Keep any stubs that are in words
  2. Append the contents of each candidate’s unvisited neighboring cells to the candidates, resulting in a new candidates list
  3. Repeat until there’s nothing left to look at

Setup

First, I need to find the adjacent cells for each cell in board.

[1]    inds←⍳⍴board                                      ⍝ board indices
[2]    neighbors←(,inds)∘∩¨↓inds∘.+(,¯2+⍳3 3)~⊂0 0       ⍝ matrix of neighbors for each cell

You might recognize line [2] as a stencil-like () operation. Why, then, didn’t I use stencil? To be honest, it didn’t occur to me at the time – I knew how to code what I needed without using stencil. As it turns out, for this application, stencil is slower. The stencil expression is shorter, more “elegant”, and possibly more readable (assuming you know how stencil works), but it takes more than twice the time. Granted, this line only runs once per invocation so the performance improvement from not using it is minimal.

      inds←⍳4 4
      ]RunTime -c '(,inds)∘∩¨↓inds∘.+(,¯2+⍳3 3)~⊂0 0' '{⊂(,⍺↓⍵)~(⍵[2;2])}⌺3 3⊢inds'
                                                                                              
  (,inds)∘∩¨↓inds∘.+(,¯2+⍳3 3)~⊂0 0 → 2.2E¯5 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕                       
  {⊂(,⍺↓⍵)~(⍵[2;2])}⌺3 3⊢inds       → 5.0E¯5 | +127% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 

I wrote a helper function nextcells which, given a path, returns the unvisited cells adjacent to the last cell in the path. For example, if we have a path that starts at board[1;1] and continues to board[2;2], then the next unvisited cells for this path are given by:

      nextcells (1 1)(2 2)
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│┌───┐│┌───┐│┌───┐│┌───┐│┌───┐│┌───┐│┌───┐│
││1 2│││1 3│││2 1│││2 3│││3 1│││3 2│││3 3││
│└───┘│└───┘│└───┘│└───┘│└───┘│└───┘│└───┘│
└─────┴─────┴─────┴─────┴─────┴─────┴─────┘

A contributor to improved performance is set up next. I created a parallel transposed matrix copy of words.

[6]    mwords←⍉↑words ⍝ matrix of candidate words, use a columnar matrix for faster ∧.=

Why create another version of words and why is it transposed?

  • In general, it’s faster to operate on simple arrays.
  • Simple arrays – arrays containing only flat, primitive, data without any nested elements – are stored in a single, contiguous, block of memory. Elements are laid out contiguously in row-major order, meaning the last dimension changes fastest. For a 2D matrix, it stores the first row left-to-right, then the second row, and so on. Transposing the word matrix makes prefix searching as we look for candidates that could become valid words much more efficient. Consider a matrix consisting of the words “THE” “BIG” “DOG”. If stored one word per row, the interpreter has to “skip” to find the first letter in each word. However, in a column-oriented matrix the first letters are next to one another and likely to be in cache, making them much quicker to access.

Things Run Faster If You Do Less Work

Smaller searches are generally faster than larger ones. If we pare down words and stubs as we progress, we will perform smaller searches. The first pass at minimizing the data to be searched is done during setup – we remove any words that don’t begin with a first letter of any of board‘s cells as well as words that contain letters not found in board:

[7]    mask←mwords[1;]∊⊃¨,board               ⍝ mark only those beginning with a letter on the board
[8]    mask←mask\∧⌿(mask/mwords)∊' ',∊board   ⍝ further mark only words containing only letters found on the board
[9]    words/⍨←mask                           ⍝ keep those words
[10]   mwords/⍨←mask                          ⍝ keep them in the matrix form as well

For board b4, this reduces the number of words to be searched from 267,752 to 16,247 – a ~94% reduction. Then we iterate, appending each path’s next unvisited cells and creating new stubs from the updated paths:

[13]       next←nextcells¨paths                      ⍝ get the next cells for each path
[14]       paths←⊃,/(⊂¨paths),¨¨next                 ⍝ append the next cells to each path
[15]       stubs←⊃,/stubs{⍺∘,¨board[⍵]}¨next         ⍝ append the next letters to each stub

Append any stubs that are in words to the result:

[16]       r,←words∩stubs                                ⍝ add any matching words

Because a cell can have more than one letter, we might have stubs of different lengths, so we need to iterate over each unique length:

[19]       lens←≢¨stubs           ⍝ length of each stub
[20]       :For n :In ∪lens       ⍝ for each unique stub length

Because we’re doing prefix searching, the inner product ∧.= can tell us which stubs match prefixes of which words. Now we can see the reason for creating mwords. Since the data in mwords is stored in “raveled” format, n↑mwords quickly returns a matrix of all n-length prefixes of words:

[21]           m←n=lens                    ⍝ mark stubs of this length
[22]           map←(↑m/stubs)∧.=n↑mwords   ⍝ map which stubs match which word beginnings
[23]           mask∨←∨⌿map                 ⍝ words that match
[24]           found[(∨/map)/⍸m]←1         ⍝ stubs that match
[25]       :EndFor

We then use our two Boolean arrays, found and mask, to pare down paths/stubs and words/mwords respectively. If we look at the number of words and stubs at each step, we can see that the biggest performance gain is realized by doing less work:

┌──────────────────┬───────┬──────┐
│Phase             │≢words │≢stubs│
├──────────────────┼───────┼──────┤
│Initial List      │267,752│     0│
├──────────────────┼───────┼──────┤
│After Initial Cull│ 16,247│    16│
├──────────────────┼───────┼──────┤
│After 2-cell Cull │  7,997│    56│
├──────────────────┼───────┼──────┤
│After 3-cell Cull │  2,736│   152│
├──────────────────┼───────┼──────┤
│After 4-cell Cull │  1,159│   178│
├──────────────────┼───────┼──────┤
│After 5-cell Cull │    371│   119│
├──────────────────┼───────┼──────┤
│After 6-cell Cull │     87│    42│
├──────────────────┼───────┼──────┤
│After 7-cell Cull │     16│    10│
├──────────────────┼───────┼──────┤
│After 8-cell Cull │      2│     1│
├──────────────────┼───────┼──────┤
│All Done          │      0│     0│
└──────────────────┴───────┴──────┘

Does mwords Make Much of a Difference?

As an experiment, I decided to write a version, f10, that does not used transposed word matrix mwords (it still does the words and stubs culling). I compared it to my original version,f0, and the fastest submitted version, f1:

      0 1 10 time 'b4'
  f0  b4 → 1.4E¯2 |     0% ⎕⎕                                       
  f1  b4 → 2.2E¯1 | +1551% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  f10 b4 → 2.1E¯1 | +1422% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 

Interestingly, f10 performed remarkably close to f1. When I looked at the code for f1, I saw that the author had implemented a similar culling approach and had commented in several places that the construct was to improve performance. Good job! But this does demonstrate that maintaining a parallel, simple, copy of words makes the solution run about 15× faster.

Takeaways

There are a couple of other optimizations I could have implemented:

  • In the setup, I could have filtered out all words that were longer than ≢∊board.
  • If this FindWords was used a lot, and I could be fairly certain that words was static (unchanging), then I could create mwords outside of FindWords. The line that creates mwords consumes about half of total time of the function.

When thinking about performance and optimization:

  • Unless there’s an overwhelming reason to do so – don’t sacrifice code clarity for performance. If you implement non-obvious performance improvements, note them in comments or documentation.
  • Optimize effectively – infinitely speeding up a piece of code that contributes 1% to an application’s CPU consumption makes no real impact.
  • Consider how your data is structured and how that might affect performance. In this case, representing words as a vector of words is convenient, readable, and there aren’t those extra spaces that might occur in a matrix format. But as we saw, it performs poorly compared to a simple matrix format. Don’t be afraid to make the data conform to a more performant organization.
  • Along similar lines, consider how simple arrays are stored in contiguous memory and whether you can take advantage of that.

In case you were wondering, the two solutions Rich Park looked at in his webinar were f4 and f7 in the timings above. The fastest submission, f1, was submitted by Julian Witte. Please remember that we did not specify performance as a criterion for the problem, so this is in no way a criticism of any of the submissions.

If you’re curious to look at the code for the submissions, this .zip file includes namespaces f0f9, each of which contains a FindWords function and any needed subordinate functions (in addition to the solution namespaces, the .zip file also includes AllWords, the time function and 4 sample boards – b2,b3,b4, and b6). You can extract and use the code as follows:

  1. Unzip Submissions.zip to a directory of your choosing.
  2. In your APL session, enter:
    ]Link.Import # {the directory you chose}/Submissions
    This step might take several seconds when Link.Import brings in AllWords

You can then examine the code, run your own timings, and so on. One interesting thing to explore is which submissions properly handle 1×1 and 0×0 boards.

Postscript

When I started to write the explanation of my code, it occurred to me: “This is 2026 and we have LLMs that might be able to explain the code. Let’s give them a try…”

So, I asked each of Anthropic’s Claude Opus 4.6 Extended, Google’s Gemini Pro, and Microsoft’s Copilot Think Deeper the following:

Explain the attached code. Note that a cell in board can have multiple letters like “qu” or “ough”. Also note that the words list is the official scrabble words list and has no single letter words.

The results were interesting and, in several places, a more concise and coherent explanation than I might produce. But how accurate and useful were their explanations? Stay tuned for a blog post about how well different LLMs explain APL code!

Outperforming Nested Arrays with Classic APL Techniques – Part 1

Let me take you back to the 1970s. We’re playing Space Invaders in the arcade, watching Star Wars in the cinema, and listening to David Bowie in our Minis. When we come home and open our terminals to use APL, all of our arrays are flat. Nested arrays will not be part of a commercial APL implementation until NARS is released in 1981. Performing computations on, say, a list of names, is not as straightforward as we might hope!

We might choose to keep each name in a row of a matrix. For example:

      M←4 7⍴'Alice  Bob    CharlieBen    '
      M
Alice
Bob
Charlie
Ben

Alternatively, we might choose to delimit each name with a ';' (or another suitable delimiter). For example:

      V←';Alice;Bob;Charlie;Ben'

If we want to count the number of names beginning with a 'B', we can’t simply call +/'B'=⊃¨names, but have to think about our representation:

      +/'B'=M[;⎕IO]
2
      +/'B'=(¯1⌽V=';')/V
2

We don’t know it yet, but 50 years later, these types of expressions will have excellent performance on the computers of the day. They will also be the key to outperforming the nested arrays that will be introduced in the 1980s.

      ⍝ let's use bigger data
      M←10000000 7⍴M
      V←(2500000×⍴V)⍴V
      N←10000000⍴'Alice' 'Bob' 'Charlie' 'Ben'

      ⍝ see how much faster the non-nested versions are!
      ]Runtime -c "+/'B'=M[;⎕IO]" "+/'B'=(¯1⌽V=';')/V" "+/'B'=⊃¨N"

  +/'B'=M[;⎕IO]      → 4.1E¯3 |     0% ⎕                                        
  +/'B'=(¯1⌽V=';')/V → 1.2E¯2 |  +192% ⎕⎕                                       
  +/'B'=⊃¨N          → 2.8E¯1 | +6863% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

The above example is an illustrative benchmark, but it shows what’s possible.

In this blog post, I’m going to explore various techniques to leverage flat representations of nested data. I’ll look at querying and manipulating these representations, and see how the concrete array representation that Dyalog APL uses affects performance.

How Arrays are Stored

The Dyalog interpreter stores your flat arrays in memory as a header (which includes some bookkeeping information needed by the interpreter) followed by the shape of the array, and then the contents of the array in ravel order. For example, 2 3⍴⎕A looks like this:

┌─────┬─────┬─────────────┐
│ ... │ 2 3 │ A B C D E F │
└─────┴─────┴─────────────┘

Nested arrays are more complicated. Every element is stored separately, potentially at a distant location in memory. A nested array is still stored with a header and shape, but the body consists of addresses of the array’s elements, rather than the elements themselves. These addresses tell the interpreter where in the workspace to find each element of the nested array. This means that the array 'ABC' 'DEF' looks like this:

            ┌──────────┐
            │          ↓
┌─────┬───┬─│───┐     ┌─────┬───┬───────┐     ┌─────┬───┬───────┐
│ ... │ 2 │ * * │ ... │ ... │ 3 │ A B C │ ... │ ... │ 3 │ D E F │
└─────┴───┴───│─┘     └─────┴───┴───────┘     └─────┴───┴───────┘
              │                                ↑
              └────────────────────────────────┘

Sometimes, the interpreter can detect that you’re reusing an array for multiple elements, and refer to it multiple times, rather than copying. For example, 4⍴⊂'ABC' is stored as:

            ┌─┬─┬─┬────────┐
            │ │ │ │        ↓
┌─────┬───┬─│─│─│─│─┐     ┌─────┬───┬───────┐
│ ... │ 4 │ * * * * │ ... │ ... │ 3 │ A B C │
└─────┴───┴─────────┘     └─────┴───┴───────┘

This layout has a few consequences for us:

  • Nested arrays need to store extra information for each of their elements – the header and the shape. Although this is patially mitigated by the trick above, this space requirement quickly grows if you have many nested elements!
  • Accessing the elements of nested arrays can be slow. Modern hardware is optimised under the assumption that you won’t start doing work far away from where you are already working. However, when you access an element of a nested array, that element could be stored very far away in the workspace, so your computer could take a while to load it.

When looking at the performance of algorithms involving arrays, doing as much as possible to reduce nesting will often yield good results.

Partitioned Vectors

Although I looked at using matrices in the previous section, I’m going to focus on using partitioned vectors from now on. Using matrices mostly involves multiple references to the rank operator (), while using a partitioned vector is much more interesting.

There are many ways to represent the partitioning of a vector into multiple sub-vectors. I’ve already shown one way – delimiting the sub-vectors with a character that does not itself appear in any sub-vector:

      V←';Alice;Bob;Charlie;Ben'

Here, a delimiter precedes the content of each sub-vector. This is very useful; if I need to know the delimiter, I can find it with ⊃V. I could also place the delimiter after each sub-vector, making it easy to convert between these representations with a rotate:

      1⌽V
Alice;Bob;Charlie;Ben;

This is the format you get from reading a file with ⎕NGET 'filename' 1, with linefeed characters (⎕UCS 10) replacing semicolons as the trailing delimiters.

You can also store the partitions in a separate array from the sub-vectors. For example, you could use a Boolean mask to indicate the start of each sub-vector:

      data←'AliceBobCharlieBen'
      parts←1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0
      [data ⋄ parts]
A l i c e B o b C h a r l i e B e n
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0

Note: this example uses array notation, a new feature available in Dyalog v20.0. Using array notation, [a ⋄ b ⋄ c] defines an array with a, b, and c as major cells. This is convenient when viewing vectors whose contents align.

One benefit of this format is that you can include any character in a sub-vector without worrying about cutting it in half by including a delimiter. I will return to this way of representing partitions later, but it’s not the only option. For an in-depth exploration, see this essay on the APL Wiki.

Partitioned Searching

Before I continue, I want to examine some timings, so I need some large, random data to work with. I’m going to use the list of English words used for spell-checking on my machine. I’ll use ⎕C to case-fold each word so that I can ignore case. This list is ordered alphabetically, but I’m not going to take advantage of that here.

      N←⎕C¨⊃⎕NGET'words.txt'1  ⍝ load the words as a nested vector
      ⍴N                       ⍝ how many are there
479823
      V←∊';',¨N                ⍝ delimited vector to work with
      100↑V                    ⍝ what does it look like
;1080;10-point;10th;11-point;12-point;16-point;18-point;1st;2;20-point;2,4,5

You can also use ⎕NGET to load the words into a flat array directly, which is significantly faster.

      ]Runtime -c "∊';',¨⊃⎕NGET'words.txt'1" "';'@{⍵=⎕UCS 10}¯1⌽⊃⎕NGET'words.txt'"
                                                                                               
  ∊';',¨⊃⎕NGET'words.txt'1            → 5.3E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
  ';'@{⍵=⎕UCS 10}¯1⌽⊃⎕NGET'words.txt' → 2.8E¯2 | -48% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕   

Text is not the only type of data that you might want to store efficiently; for example, you might need to store DNA strings or numeric vectors. However, I’m going to use text in this example.

I’ve already demonstrated a very basic example of searching a partitioned vector for a pattern – I counted the number of names beginning with 'B'. Let’s investigate how that really worked. I’ll start by finding a mask of the delimiter preceding each word:

      m←V=';'
      20(↑⍤1)[V ⋄ m]
; 1 0 8 0 ; 1 0 - p o i n t ; 1 0 t h ;
1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1

If I rotate this mask to the right by one place, it now corresponds to the first letter of each word:

      20(↑⍤1)[V ⋄ ¯1⌽m]
; 1 0 8 0 ; 1 0 - p o i n t ; 1 0 t h ;
0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0

I can use this rotated mask to pick out the first character of each word; it then becomes easy to count the words beginning with 'b' (not 'B' anymore, as I case-folded the words):

      50↑(¯1⌽m)/V     ⍝ first character of each of the first 50 words
111111112222223333334444455566778899-aaaaaaaaaaaaa
      +/'b'=(¯1⌽m)/V  ⍝ number of words that start with a 'b'
25192
      +/'b'=⊃¨N       ⍝ double check answer against nested
25192

Now for something more complicated. Say I want to count the number of words beginning with the prefix 'con'. I could reuse the previous technique to find the first character of each word, and tweak it to find the second and third characters as well:

      ⍝ ┌─ starts c ─┐ ┌── then o ──┐ ┌── then n ──┐
      +/('c'=(¯1⌽m)/V)∧('o'=(¯2⌽m)/V)∧('n'=(¯3⌽m)/V)
3440

I need to be careful here! Some words in the list are shorter than 3 letters, so when I rotate the mask, I start picking from the following word. Fortunately, I’ve included the delimiters, so when I cross a word-boundary, one of the tests will evaluate to false. If I stored a partition representation separately to the data, I would need to handle this case.

There’s a nice way to use find () to solve this problem. As the start of a word is explicitly encoded by a ';' in our data, I can search directly for the start of a word followed by 'con'.

      +/';con'⍷V
3440

Both of these methods are much faster than acting on the nested data:

      ]Runtime -c "+/{'con'≡3↑⍵}¨N" "+/('c'=(¯1⌽m)/V)∧('o'=(¯2⌽m)/V)∧('n'=(¯3⌽m)/V)⊣m←V=';'" "+/';con'⍷V"

  +/{'con'≡3↑⍵}¨N                                        → 4.3E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
  +/('c'=(¯1⌽m)/V)∧('o'=(¯2⌽m)/V)∧('n'=(¯3⌽m)/V)⊣m←V=';' → 2.5E¯3 | -95% ⎕⎕                                       
  +/';con'⍷V                                             → 3.3E¯3 | -93% ⎕⎕⎕

Great, I can count things at the start of words… but things get really interesting when I want to count things occurring anywhere in a word, as I need to get creative with using delimiters as anchors.

Say I want to count the number of words which contain the letter 'a'. I can get a mask of all the occurrences of it, but as a word can contain more than one 'a', I have to be clever about counting. I’ll select some test data to see what’s going on:

      test←';abc;xyz;banana'

There are several ways to count the words here that countain an 'a'. The first is to use a scan of the occurrences of a delimiter to give a unique identifier for each word. I can then use the occurrences of 'a' to select these IDs. The number of words that contain an 'a' is then the number of unique IDs:

      [
            test        ⍝ data
            test=';'    ⍝ mask of delimiters
            +\test=';'  ⍝ word IDs
            test='a'    ⍝ mask of 'a's
      ]
; a b c ; x y z ; b a n a n a
1 0 0 0 1 0 0 0 1 0 0 0 0 0 0
1 1 1 1 2 2 2 2 3 3 3 3 3 3 3
0 1 0 0 0 0 0 0 0 0 1 0 1 0 1
      (test='a')/+\test=';'    ⍝ word IDs of each 'a' (one in 'abc', three in 'banana')
1 3 3 3
      ≢∪(test='a')/+\test=';'  ⍝ number of unique IDs
2

Note: this example again uses array notation, a new feature available in Dyalog v20.0. Line breaks can be used in place of s in array notation; I have used this option here to evaluate each row of a matrix on its own line.

I want to check that this also works on large input:

      ⍝ try it on the large input
      ≢∪(V='a')/+\V=';'
281193
      ⍝ double-check against the nested format
      +/∨/¨N='a'
281193
      ⍝ and it's faster
      ]Runtime -c "+/∨/¨N='a'" "≢∪(V='a')/+\V=';'"
                                                                             
  +/∨/¨N='a'        → 5.2E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
  ≢∪(V='a')/+\V=';' → 4.6E¯3 | -91% ⎕⎕⎕⎕

That’s one way to solve this problem, but there are many more and I’m sure the interested reader will come up with their own. Here are a few alternatives for inspiration:

      ≢∪(V=';')⍸⍥⍸V='a'
281193
      +/';a'⍷V∩'a;'
281193
      +/1≠¯2-/(⍸,⎕IO+≢)';'=V∩'a;'
281193
      +/2</0,(1⌽V=';')/+\V='a'
281193
      (V=';'){+/(⍺/⍵)≥a/1⌽a←⍺/⍨⍵∨⍺}V='a'
281193

I’m going to have a closer look at how the last two of these work in part 2 of this post. For now, I’ll move beyond counting and do some structural manipulation with this format.

Partitioned Manipulation

Filtering

Some manipulations on the delimited format are as easy as they would be with the nested format. Say you’re interested in the distribution of vowels among words (a, e, i, o, and u in English). You might then want to pare down your list of words to include only the vowels. With our delimited format, this is simple:

      test←';the;quick;brown;fox;jumps;over;the;lazy;dog'
      (test∊';aeiou')/test  ⍝ one way
;e;ui;o;o;u;oe;e;a;o
      test∩';aeiou'         ⍝ another way
;e;ui;o;o;u;oe;e;a;o

Besides remembering to preserve the delimiter, there’s nothing tricky going on here. Don’t get too comfortable though, things are about to get trickier!

So that I can continue to check my answers against the nested representation, I will define a utility function to split a delimited vector into a nested representation:

      Split←{1↓¨(⍵=';')⊂⍵}
      Split test 
┌───┬─────┬─────┬───┬─────┬────┬───┬────┬───┐
│the│quick│brown│fox│jumps│over│the│lazy│dog│
└───┴─────┴─────┴───┴─────┴────┴───┴────┴───┘

I can now test filtering against the nested representation:

      (Split V∩';aeiou')≡(N∩¨⊂'aeiou')
1

You can also write Split as {(⍵≠';')⊆⍵}, but this fails on some edge cases. Can you see which ones?

As well as filtering the letters of a word, I might want to filter words out of the whole list. I could do this filter on any condition; for now, I’ll use a condition that I know how to compute already and filter for words that begin with 'con'. Once I have identified the places in the delimited vector that indicate words beginning with 'con', I can find the word ID for each of those places, and use those IDs to construct a mask to filter the data:

      test←';banana;cons;conman;apple;convey'
      ids←+\test=';'                  ⍝ word IDs
      (';con'⍷test)/ids               ⍝ IDs of words that start with 'con'
2 3 5
      m←ids∊(';con'⍷test)/ids         ⍝ mask of words that start with 'con'
      [test ⋄ ';con'⍷test ⋄ ids ⋄ m]  ⍝ see how those line up
; b a n a n a ; c o n s ; c o n m a n ; a p p l e ; c o n v e y
0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 5
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1
      m/test  ⍝ only words that start with 'con'
;cons;conman;convey

I’ll try it on the big data and compare the result to the nested version:

      answer←V/⍨ids∊(';con'⍷V)/ids←+\V=';'
      (Split answer)≡{'con'≡3↑⍵}¨⍛/N
1

Note: this example uses the behind operator (), a new feature available in Dyalog v20.0.

This method seems to work perfectly, but what is the performance like?

      ]Runtime -c "{'con'≡3↑⍵}¨⍛/N" "V/⍨ids∊(';con'⍷V)/ids←+\V=';'"
                                                                                         
  {'con'≡3↑⍵}¨⍛/N               → 3.8E¯2 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
* V/⍨ids∊(';con'⍷V)/ids←+\V=';' → 1.1E¯2 | -73% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

Note the * by the delimited version. This indicates that the result is different to the result of the nested version. However, this is to be expected; I did not include the Split in the timings, so the result formats are different. However, I know that the results match if I ignore the format, as I checked them before.

Reversing

Continuing the tour of miscellaneous things that you might like to do with a delimited vector, let’s look at reversal. With filtering, filtering the contents of each word turned out to be less complicated than filtering whole words. With reversal, the reverse (haha!) is true. The technique to reverse each word builds on the technique to reverse the order of the words.

The key will be the vector of word IDs that I’ve used before. To reverse the order of the words, I can use the grade of the word IDs directly to sort our data. This works because word IDs are increasing along the vector, so by grading the IDs down, I fetch larger IDs (later words) to the start of the result. It also relies on being stable, meaning that the order of elements with the same value (that is, letters within a word) is preserved.

      test←';first;second;third'
      ids←+\test=';'
      test[⍒ids]
;third;second;first

      ⍝ how everything lines up
      [test ⋄ ids ⋄ ⍒ids ⋄ test[⍒ids]]
 ;  f  i  r  s  t ; s e  c  o  n  d ; t h i r d
 1  1  1  1  1  1 2 2 2  2  2  2  2 3 3 3 3 3 3
14 15 16 17 18 19 7 8 9 10 11 12 13 1 2 3 4 5 6
 ;  t  h  i  r  d ; s e  c  o  n  d ; f i r s t

I can build on this to reverse the letters of each word. Notice that if I reverse this result, I get almost exactly what I need; each word is back in its original position, with its letters in the reverse order:

      ⌽test[⍒ids]
tsrif;dnoces;driht;

The only issue is that the delimiters are now trailing, rather than leading, but that’s easily fixed with a rotate:

      ¯1⌽⌽test[⍒ids]
;tsrif;dnoces;driht

As a matter of taste, I prefer to do all the manipulation on the grade vector rather than on the result:

      test[¯1⌽⌽⍒ids]
;tsrif;dnoces;driht

Isn’t that nice? When I first thought about doing real manipulations on this format, I didn’t expect the code to be so simple. For completeness, here are the usual checks and timings:

      ⍝ check the results are correct
      (⌽N)≡Split V[⍒+\V=';']
1
      (⌽¨N)≡Split V[¯1⌽⌽⍒+\V=';']
1
      ⍝ look at the runtimes
      ]Runtime -c "⌽N" "V[⍒+\V=';']"
                                                                        
  ⌽N          → 2.9E¯3 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕                                 
* V[⍒+\V=';'] → 1.5E¯2 | +419% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
      ]Runtime -c "⌽¨N" "V[¯1⌽⌽⍒+\V=';']"
                                                                          
  ⌽¨N             → 1.7E¯2 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
* V[¯1⌽⌽⍒+\V=';'] → 1.8E¯2 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 

What’s happening here? Our fancy flat techniques are supposed to give us superior performance! Well, recall how arrays are stored. To reverse a nested array, all the interpreter has to do is reverse the addresses of the nested elements – it doesn’t have to touch the contents of those elements at all:

┌─────┬───┬───────┐             ┌─────┬───┬───────┐
│ ... │ 3 │ * * * │  →becomes→  │ ... │ 3 │ * * * │
└─────┴───┴─│─│─│─┘             └─────┴───┴─│─│─│─┘
          ┌─┘ │ └─┐                         └─│─│─┐
          │   │   │                       ┌───│─┘ │
          ↓   ↓   ↓                       ↓   ↓   ↓
         ┌─┐ ┌─┐ ┌─┐                     ┌─┐ ┌─┐ ┌─┐
         │A│ │B│ │C│                     │A│ │B│ │C│
         └─┘ └─┘ └─┘                     └─┘ └─┘ └─┘

By contrast, our flat version needs to process every letter of every word. It also needs to perform a grade, which is relatively expensive.

The difference evens out when I record the time taken to reverse each word. This is likely because the nested version now has to traverse every nested element doing a reversal, while the flat version only needs to do the relatively cheap extra work of ¯1⌽⌽.

To Be Continued…

In conclusion, this flat format is not a magic bullet for performance, it really depends on exactly what you want to do with it. If you’re doing a lot of searching, then using a flat format might be what you need, but if you’re doing more manipulatating, then a nested format might be better. The only way to know is to write the code and test the performance on representative data!

I’ve covered some interesting ways to process character data in this flat format, but it doesn’t stop there! I haven’t yet touched on numeric or Boolean data at all, and that’s where things get really interesting. If we have a partitioned vector, how might we sum each sub-vector? How might we do a plus-scan on each sub-vector? Will this be faster or slower than nested equivalents? The second part of this post (coming soon!), will give the answers to these questions and more.

Taking Back Ctrl: Keyboard Shortcuts in the Dyalog IDEs

In this blog post post I describe:

  • how to use standard Ctrl keyboard shortcuts when using Dyalog on all platforms.
  • little-known APL input features for Microsoft Windows.
  • alternative input methods that do not use the Ctrl key for APL input.

I don’t personally use the official Dyalog IME for Microsoft Windows as my daily APL input method. My only complaints are that it forces me to use the Ctrl key for APL input, which conflicts with conventional shortcuts for most applications, and for some reason it puts an additional item in my input method list because I have a Japanese input enabled, even though that is not a supported layout for the IME!

I am a fan of Adám’s AltGr keyboard for Windows for a couple of reasons. It is based on the Microsoft Keyboard Layout Creator (MSKLC), and appears to work in newer Universal Windows Platform (UWP) apps and with the default On-Screen Keyboard on Windows. It also has accents and box-drawing characters. However, support might be unreliable, because Microsoft do not officially support MSKLC on Windows 11, and there are rumoured issues with using it on ARM machines. Its main drawback is the limited choice of shifting keys, which can make it difficult to use with certain keyboard layouts or in certain applications. Personally, I am fond of using Caps Lock as my primary APL shifting key, because it is rare these days that I need to TYPE A BUNCH OF THINGS IN ALL CAPITALS – it feels like shouting.

I used to do this by mapping Caps Lock to Alt Gr through some registry trickery. However, Kanata allows me to use almost any keys I like. I now have Caps Lock and Right Ctrl as APL shifting keys. In addition, I can double-tap Caps Lock to toggle it, so I can SHOUT OVER TEXT if I really need to. It’s cross-platform, so my single APL Kanata configuration gives me a consistent experience even when using my Linux virtual machine. There are still many features on my wish list for a comprehensive input method editor, but this is quite good for now (I wonder if anybody is able to adapt mozc for APL?).

Whichever keyboard input method you use, once you are not using Ctrl for APL input you will be very glad to get back your beloved, conventional keyboard shortcuts for use in the Dyalog editor.

In the Dyalog IDE for Microsoft Windows, you can set keyboard shortcuts by selecting Options > Configure in the Session’s menu bar, then selecting the Keyboard Shortcuts tab. From here, you can click on a code that corresponds to an action for which you want to set a shortcut.

Dyalog IME for Microsoft Windows Configuration Dialog

These are keyboard shortcuts recognised by the Dyalog development environment.

I set:

  • <FX> to Ctrl+S to save changes in the editor without closing it
  • <SA> to Ctrl+A to select all text in a window
  • <CP> to Ctrl+C to copy text
  • <CT> to Ctrl+X to cut text
  • <PT> to Ctrl+V to paste text
  • <SC> to Ctrl+F to search and find text in the session log, editor, and debugger
  • <RP> to Ctrl+H to search and replace text in the session log, editor, and debugger

To help find these in the list, click on the Code column title to put them in alphabetical order.

In Ride, only <FX> needs to be set, as the other shortcuts should default to this behaviour.

Having said all that, the official Dyalog IME still has features that would be invaluable especially if I were a brand new user. You can read about these in the Dyalog for Microsoft Windows UI guide and configure them using the IDE by going to Options > Configure > Unicode Input > Configure Layout. A little-known tip is that its backtick (also known as introducer or prefix) input method offers a searchable keyword menu, so that you can find glyphs by name or by the name of the function or operator it represents.

After setting the prefix introducer key to # (octothorpe / hash), pressing twice and typing “rot” shows suggestions for the glyphs and with the keywords “rotate” and “rotatefirst”.

There’s also the wonderful overstrike feature. Once upon a time, APL was executed on remote machines accessed using the IBM Selectric Teletype typewriter; you can see this in this demonstration from 1975. Certain characters would be composed by typing one symbol over another, such as printing a vertical bar (|) over a circle () to produce the circle-bar glyph for rotate/reverse ().

Its basic usage is a feature of the IDE for Microsoft Windows; the IME isn’t required at all. If you press the Ins key on your keyboard, it usually toggles “insert” or “overtype” mode. This is typically indicated by the vertical bar text cursor changing into an underbar. Typing will then replace existing text indicated by the cursor. With overstrikes enabled, APL glyphs can be created using other glyphs!

APL Glyph Overstrike Animation

Insert, Slash (/), LeftArrow, Dash (-), Circle (), LeftArrow, Stile (|)

The IME allows you to use this in applications other than Dyalog by ticking Enable Overstrikes from the IME settings. You can read the documentation to learn more, but the basic mechanism is to use the Overstrike Introducer Key (default Ctrl+Backspace) followed by entering the two symbols to overstrike.

Settings menu for the Dyalog Input Method Editor

For this post I just wanted to highlight aspects of APL input beyond the simple defaults. For more about using your keyboard with Dyalog, see Adám’s blog post about enhanced debugging keys.

Advent of Code 2025 – Day 1 in APL

Advent of Code is an annual challenge in which a new programming puzzle is released every day for the first 12 days of December. We always see some great solutions in Dyalog APL. I’m going to look at the first problem, drawing on my own solution and some of those that have been shared across various message boards and in the APL Wiki.

Parsing

Day 1’s problem tasked us with uncovering the password to enter the North Pole by counting how often a dial points to zero in a sequence of rotations. Before we can solve the problem, we have to parse the input file we’re given. If you’re not interested in the parsing, you can move straight to Part One, but you’ll miss a cool new way to use the -functions!

The standard way to read a text file is to use the ⎕NGET (Native file GET) system function. A little known addition to ⎕NGET, available in the recent Dyalog v20.0 release, is the ability to load a file directly into a matrix. This is faster than loading each line as a character vector and mixing them together, and is very handy for Advent of Code problems, which often have rectangular input files.

To parse the numbers given in the input file, it’s easy to use to execute the text for each number as APL code, yielding the number as a result. For a fun exercise like Advent of Code, this is fine, but in production code, this can be very dangerous. If you’re -ing text provided by a user, they could provide a malicious input like ⎕OFF, or worse! This is why the system function ⎕VFI (Verify Fix Input) exists, to parse numbers and only numbers. ⎕VFI also allows you to configure the separators between numbers. For instance, you could use ','⎕VFI to parse numbers separated by commas.

Using ⎕NGET and ⎕VFI together gives a very neat way to parse the input, abusing the Ls and Rs as delimiters for the numbers.

      ⎕IO←0
      input←⊃⎕NGET 'example.txt' 2 ⍝ provide the flag 2, to load the file as a matrix
      input
L68
L30
R48
L5
R60
L55
L1
L99
R14
L82
      1↓,input ⍝ our input numbers are delimited by 'L's and 'R's
68L30R48L5 R60L55L1 L99R14L82
      size←1⊃'LR'⎕VFI 1↓,input ⍝ use ⎕VFI, telling it that 'L' and 'R' are the separators
      size
68 30 48 5 60 55 1 99 14 82

This is much faster than loading the input into separate lines, and executing the number on each one.

      ]Runtime -c "1⊃'LR'⎕VFI 1↓,⊃⎕NGET 'input.txt' 2" "(⍎1↓⊢)¨⊃⎕NGET 'input.txt' 1"

1⊃'LR'⎕VFI 1↓,⊃⎕NGET 'input.txt' 2 → 6.9E¯4 |    0% ⎕⎕⎕⎕⎕
(⍎1↓⊢)¨⊃⎕NGET 'input.txt' 1 → 5.9E¯3        | +754% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

In addition to the sizes, we also need to parse the direction of each rotation. This is a simple matter of comparing the first column of the input against 'L' or 'R', choosing a 1 for each 'R' (as a right turn is considered positive), and a ¯1 for each 'L' (as a left turn is considered negative). The classic way to do this is with ¯1*.

      input[;0]
LLRLRLLLRL
      direction←¯1*'L'=input[;0]
      direction
¯1 ¯1 1 ¯1 1 ¯1 ¯1 ¯1 1 ¯1

As an alternative, Reddit user u/light_switchy found the very nice train 'R'(=-≠) for this.

      'R'(=-≠)input[;0]
¯1 ¯1 1 ¯1 1 ¯1 ¯1 ¯1 1 ¯1

Tacit programming can be tricky to read. For the benefit of those of us who prefer explicit code, 'R'(=-≠)x is evaluated as ('R'=x)-('R'≠x), which is the same as ('R'=x)-('L'=x), since x is made up of only 'R's and 'L's.

Part One

Many APLers have solved this year’s first problem, and the approach is so natural that everybody’s solution had a similar shape. By multiplying the size and direction of each rotation, we can find the difference in position that each rotation makes:

      size×direction
¯68 ¯30 48 ¯5 60 ¯55 ¯1 ¯99 14 ¯82

We can then use a sum scan to find the cumulative position of the dial after each rotation, remembering to include the starting position (50):

      +\50,size×direction
50 ¯18 ¯48 0 ¯5 55 0 ¯1 ¯100 ¯86 ¯168

We are working with a circular dial, so instead of going above 99, or below 0, we want to wrap around the numbers on the dial. We can use residue (|) to fix this:

      100|+\50,size×direction
50 82 52 0 95 55 0 99 0 14 32

Finally, we are required to find the number of times the dial is left pointing at zero. Counting this is straightforward, the hard work has already been done:

      +/0=100|+\50,size×direction
3

Part Two

For part two of today’s problem, we are asked to find not only the number of times the dial finished a rotation on zero, but also the number of times any click of the dial points it to zero.

For the size of input we are given today, generating the position of the dial after every single click is an acceptable solution. You could do this using and some transformations, but several people found a nice trick to transform the code from part one into exactly what we need.

In part one, we represented the rotation L68 by the number ¯68, meaning 68 clicks in the negative direction. For this part, we want to represent each of those clicks independently, which we can do by replicating the direction by the size of the rotation, rather than multiplying it:

      size/direction
¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ¯1 ...

Using the same method as in part one, we can find the position of the dial after every single click, and count the number of times we see a zero:

      +/0=100|+\50,size/direction
6

This is a change of just one character from our part one solution, very elegant! Therefore, our solution for day one is:

      input←⊃⎕NGET 'input.txt' 2   ⍝ load the input as a matrix
      size←1⊃'LR'⎕VFI 1↓,input     ⍝ parse the size of each rotation
      direction←¯1*'L'=input[;0]   ⍝ find the direction of each rotation
      +/0=100|+\50,size×direction  ⍝ solve part 1
      +/0=100|+\50,size/direction  ⍝ solve part 2

If you’re interested in a more performant solution, there is a way to avoid generating all the intermediate positions of the dial. Handling the edge cases is tricky, but it runs much faster. This blog post is long enough already, so I’ll leave determining how it works to you!

      ]Runtime -c "+/0=100|+\50,size/direction" "SecretFastSolution"

+/0=100|+\50,size/direction → 1.2E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
SecretFastSolution → 3.2E¯5          | -98% ⎕

Advent of Code 2025 is open for submissions until 12 December 2025, so it’s not too late to get involved! Trying the problems yourself, and then reading the approaches by others, is a fantastic way to learn new programming techniques. Happy APLing!

Welcome Asher Harvey-Smith

When he was young(er), Asher used to invent fictional languages for fantasy creatures. It should have been no surprise, then, that he found a love of designing and implementing programming languages when he began to experiment with computers. He studied computer science at the University of Warwick, where he discovered APL and was immediately hooked. After one cold email, two internships at Dyalog Ltd, and a master’s degree, Asher joined us officially as a developer.

Asher is working primarily on the interpreter, and is interested in both performance and correctness (two of our core values, so he fits in well!). Having implemented several small interpreters on his own, he is in awe at everything Dyalog APL can do, and eager to push it to even greater heights. He also loves to teach and present APL to new audiences, particularly when he can distill concepts into visuals and animations.

Outside work, Asher enjoys word games and board games. Once a keen chess player, he defended Dyalog Ltd’s honour against chess hustlers on the streets of New York whilst preparing for DYNA Fall 2025! He also goes running (when it is not too cold), and spends a lot of time on trains to visit his scattered circle of friends.

Welcome Martin Franck

One of the most formally-dressed members of Team Dyalog, Martin enjoys managing cloud infrastructure and helping everyone get even further with IT. He comes from a somewhat classical background, having worked his way up from changing printer toner to managing teams and working with service management and cloud operations at Maersk and Kraftvaerk. In his former roles he focused on building reliable, people-friendly, IT environments. His approach to technology has always been simple – IT should make life easier, not more complicated. Processes, in his view, only have value if they help people to achieve something meaningful, otherwise, they’re just paperwork with better formatting. This philosophy is welcomed at Dyalog Ltd.

Before joining Dyalog Ltd, Martin had spent years discussing IT philosophies, the importance of good service, and occasionally how many gadgets one person can reasonably own with Stine, our CEO. When the opportunity came to turn those conversations into action, he joined the team and quickly discovered that working at Dyalog Ltd can be just as entertaining as it is technical. That said, one part of the job came with unintended consequences. During the hiring discussions, Martin asked Stine whether they could exercise together, to motivate him to get to the gym more often. This has now evolved into a twice-weekly gym partnership, which he suspects is less “light encouragement” and more “mandatory strength training”.

Like most people at Dyalog Ltd, Martin is not just an IT guy in a suit – he also has a secret identity! When leaving the office, Martin changes out the shirt for a hand-stitched tunic, turning into the owner of The Merchants Guild – his own medieval shop offering everything a 14th-century citizen could possibly need: clothing, tools, furniture, books, and the occasional spoon. Having spent more than twenty years in the Danish medieval re-enactment community, he founded the shop to give something back — and possibly to justify owning far more wooden chests than anyone reasonably should.

When not working, Martin is happiest creating something new, whether it’s a piece of furniture, a bit of code, or a castle in Minecraft. He enjoys the challenge of turning an idea into something tangible, preferably with a practical twist.

In many ways, that blend of curiosity and craftsmanship makes him a good fit for Dyalog Ltd. Whether he’s fine-tuning IT infrastructure or constructing a medieval bench, Martin approaches it all with the same philosophy: if it’s worth doing, it’s worth doing properly, preferably with a hint of character and a cold can of Coca-Cola.