A New Stack for the APL Challenge

In February 2026, the APL Challenge quietly launched its next round on a rewritten stack. The previous infrastructure was retired and replaced by something considerably smaller and, I think, much more pleasant to maintain. This blog post describes how we got here, and shows off some of the Dyalog v20.0 idioms that shaped the rewrite along the way.

A short tour of competition history

Dyalog Ltd has been running some form of programming contest for 17 years now, and almost every era has had its own stack.

2009-2012 – Email: The earliest competitions were essentially mailing lists: Brooke Allen seeded the very first World Wide Programming Competition with the first twenty Project Euler problems, and entrants emailed in their solutions. Subsequent International APL Programming Contest rounds described the tasks on the main Dyalog Ltd website. The 2011 winner, Joel Hough, observed that most popular programming languages had a “Try X” website that let newcomers experiment without installing anything, and that APL didn’t. Participation was restricted to those who could prove their student status; participants also needed to apply for a free educational licence. They could only access the full Dyalog product (and start solving problems) once the application was approved and they had downloaded and installed the system. Joel’s suggestion led directly to the creation of TryAPL, significantly lowering the bar to entry.

2013-2018 – StudentCompetitions/Sqore: Task descriptions and submissions moved onto the third-party platform StudentCompetitions (later renamed Sqore), and the title went through a period of flux, eventually settling on the APL Problem Solving Competition, emphasising APL as a problem solving tool over the mechanical programming aspects. The problems were split into two “phases”: Phase 1 consisted of ten “one-liner” problems while Phase 2 had a PDF specification of more complex problems and included a template for answering. There wasn’t much code involved on our side, but submissions arrived in inconsistent formats and weren’t sandboxed in any way, so each one had to be inspected (and possibly reformatted by hand) before it could be run. We used Phase 1 as a filter; you could only enter Phase 2 if you had correctly answered a minimum of one problem from Phase 1. We wrote ad-hoc code for testing at least parts of Phase 2. From 2014, we also allowed non-student participants, but they could only win a free trip to the next user meeting, and not a cash prize like the student winners.

2019-2021 – Back in-house: The contest moved onto Dyalog-grown infrastructure. Almost everything we could write in APL, we wrote in APL:

  • MiServer generated and served the website
  • HttpCommand talked to remote services
  • SMTP sent confirmation emails
  • DCL hashed passwords
  • Conga provided network communications for both MiServer and HttpCommand
  • DrA logged and reported errors to the developers
  • SQAPL communicated with a MariaDB database

This is what it looked like:

APL Problem Solving Competition screenshot

The sources were managed with Git and stored on GitHub in two repositories: one which held the MiSite code and styling, and one which held the per-round JSON test specifications and per-problem MiPages. The intention was that Jenkins’ continuous integration using Docker containers deployed using Rancher (and later Docker Swarm, when Rancher went all-in on Kubernetes) would make content updates hot-swappable into a running site. In practice, this never worked reliably and content pushes regularly required a service restart anyway, and the split added co-ordination overhead between the two repositories.

We used HAProxy for load balancing and reCAPTCHA to keep the bots out. Phase 1 validation was done by generating test scripts which were then sent to Try It Online for sandboxed execution. Phase 2 was a glorified file upload form. Brian Becker showed a simplified diagram of the moving parts in his Dyalog ’19 presentation:

Brian's diagram

2022-2023 – Enhancements: The Try It Online dependency was replaced with Safe Execute, itself derived from the original TryAPL code, and bespoke checks were added to verify that submitted Phase 2 entries followed the syntactic constraints of each task. Code quality still factored into the final score, so the automated tests were one input among several.

In his introduction to the competition prize-giving ceremony during Dyalog ’23, Brian outlined our thoughts for the future of the competition.

Enter the APL Challenge

In February 2024 we launched the first round of the APL Challenge, replacing the previous Phase 1. After a short break, Phase 2 also got a replacement: The APL Forge. The competition was now:

  • for everyone: While previous competitions exclusively or primarily targetted students, the APL Challenge is equally open to everyone. To facilitate participation by younger, new-to-APL, and foreign-language contestants, its descriptions of APL features and problems are deliberately written in a simplified style.
  • always open: Previously the competition was open for a few months each year. The APL Challenge is open (almost) continuously, with four rounds each lasting three months (with only a day’s down-time between them), and each round is followed by the awarding of prizes.
  • self-contained: Previously, you had to learn APL on your own to be able to participate. With the APL Challenge, each round teaches everything that is needed as part of the problem statements, building up to a more demanding tenth problem (often inspired by – or lifted directly from – the old Phase 1 archive).
  • entirely automated: Every submission is quickly and fully checked, with prizes awarded only based on correctness.

Initially, we recycled the existing server set-up, including the account system and the Phase 1 code for presenting problems and checking submissions. However, that code had already accumulated a significant amount of technical debt, and the changes needed to repurpose it didn’t help. MiServer’s performance characteristics meant that we had to run several parallel containers behind a load balancer, and the automated checker was slow because each submission was juggled across threads. When I showed the APL Challenge to a couple of classes at my children’s school, I also noticed that the account system was a significant barrier: the children either had no email addresses of their own, or couldn’t access them during school hours, which halted the sign-up flow before it began. Although none of this was really broken, it was a consideration for the list of “things to clean up when there’s time”.

The front end of the site was also redesigned to align more closely with modern styling:

MiServer-based APL Challenge screenshot

In late 2025 I finally found the time we needed. With Dyalog v20.0’s release approaching, I incorporated several of the key features introduced in that release into my rewrite – especially ⎕VGET, the enhancements to ⎕NS, array notation, and the behind operator (), but even the new ⎕SHELL found a use in the code base. Most of the code is fairly straightforward, but one section constructs APL expressions, including single-line dfns, at run time to put the participant’s solution into an appropriate testing context. Traditional debugging tools fell short here, but inline tracing came to the rescue. The result has been live since February 2026.

The new stack

We’d accumulated a lot of experience with Material for MkDocs from the online documentation overhaul for Dyalog v20.0 (compare it to the older v19.0 site it replaced) and from rewriting the APL Quest site, which hosts the old APL Problem Solving Competition Phase 1 problems. Both sites are static MkDocs builds, although the APL Quest does automated checking using an external fork of Attempt This Online, and both have proven straightforward to maintain. Reusing Material for MkDocs as the APL Challenge frontend was the obvious move. Combined with Jarvis for the backend and a small amount of dynamic content, the new stack is:

  • APL code that builds a static site for each round using Material for MkDocs.
  • Jarvis with one small enhancement, serving both the static site and a four-endpoint JSON API.
  • HttpCommand to forward subscription requests to MailerLite.
  • Conga for Jarvis and HttpCommand.
  • Git, GitHub, Docker Swarm, and Jenkins (as before).

By adding Material for MkDocs and Jarvis, we were able to remove MiServer, SMTP, DCL, DrA, SQAPL, HAProxy, Rancher, MySQL/MariaDB, reCAPTCHA, Safe Execute, the load balancer, the multiple container replicas, the dual-repository arrangement, and seven JavaScript libraries used by the MiSite code.

The database was replaced with two TSV (Tab-Separated Values) files in persistent storage. Authorisation, where it exists, is a token that is compared to an environment variable.

After some deliberation, and consulting with our in-house GDPR experts, we also found a way to remove the need to register: the participant’s email is now submitted along with each solution. This let us remove the whole signup-password-confirmation-email procedure, so children at school can use their own (or their parents’) email address without needing access to it immediately, and only receiving an email if they win a prize.

The entire backend is now just over 400 short lines (averaging less than 25 characters) of APL across 16 functions, plus one namespace of 35 English phrases (we plan on adding internationalisation later). For comparison, the previous site used over 2,000 lines of APL and eight files of English phrases and email stubs.

One static site for each round

Each round teaches a different progression of glyphs and concepts. Rather than rendering pages dynamically, we build one Material for MkDocs site for each round. Each round lives in its own uppercase single-letter directory (A through I, with Z reserved for the holding-period site that’s served between rounds). The previous system identified rounds by calendar slots (2024’s round 1 was 20241, and so on). Decoupling round identity from the schedule means that a given round can be re-served later and a round can be designed and tested entirely independently of when it goes live.

A single copy of the shared files and folders – assets, JavaScript, and the legal and front pages – are kept in add/. Before the MkDocs build runs, these are copied into each round’s directory and {{X}} placeholders in markdown files are substituted with the round letter:

 dirs←⊃⊢⍤//1=@1⊢1 0 ⎕NINFO ⎕OPT 1⊢dir,'/?'
 Copy←{
     dest←d,'/',⍺
     ⎕←'COPY -',(2↓∊' & '∘,¨⊆⍵),' → ',∊1 ⎕NPARTS dest,'/'
     dest ⎕NCOPY ⎕OPT'IfExists' 'Replace'⊢(dir,'/add/',⍺,'/')∘,¨⊆⍵
 }
 ls←⊢/¨dirs
 incl←ls∊build
 :For d l :InEach incl∘/¨dirs ls
     ''Copy'mkdocs.yml' 'overrides'
     'docs'Copy'ass' 'js' 'legal.md','index.md'/∘⊂⍨'Z'≠l

     files←(0∊2∘⊃∊⎕D⍨)¨⍤⎕NPARTS⍛/⊃⎕NINFO ⎕OPT 1⊢d,'/docs/*.md'
     :For file :In files
         cont←⊃⎕NGET file 1
         file 1 ⎕NPUT⍨⊂'{{X}}'⎕R l ⎕OPT'Regex' 0⊢cont
     :EndFor

     :If build∩0 1
         {⎕←⍵}¨⊃⊃⎕SHELL ⎕OPT'WorkingDir'd⊢'python -m mkdocs build'
     :EndIf
 :EndFor

A hot-swappable Jarvis

Jarvis serves a static directory using its HTMLInterface setting. However, that setting was defined as a field that was read once when the server starts, and we needed to be able to switch which round’s site/ directory was being served while the server was running, on a fixed UTC schedule. So we amended Jarvis to make HTMLInterface a property: assigning to it now adjusts the private fields that were previously only set at boot time.

That small change is what makes the rest of the design work; round switching is a background thread that watches a single TSV file:

 :Repeat
     :Trap 0
         newChange←13 ⎕NINFO filename
         :If change≢newChange
         :OrIf 0∊40 ⎕ATX'utc' 'dir'
             change←newChange
             (utc dir)←⊃(TSV ⎕OPT'Invert' 2)filename ⍬ 1 1
             inst.Log'Schedule loaded from ',filename
         :EndIf
         begin←1 ⎕DT(2⊃⎕VFI)¨' '@(~∊∘⎕D)¨utc
         interval←begin⍸now
         :If 900⌶0
             newRound←interval⊃dir
         :EndIf
         :If round≢newRound
             round←newRound
             html←⌽round@7⊢'/',(⊃∊'/\'⍨)⍛↓⌽inst.HTMLInterface
             inst.Log'Web interface changed: ',inst.HTMLInterface,' → ',html
             inst.HTMLInterface←html
             file←inst.CodeSource,'/',round,'.json5'
         :EndIf

         newTestFileInfo←FileInfo file
         :If fileInfo≢newTestFileInfo
             fileInfo←newTestFileInfo
             inst.Log'Test data updated from ',file
             data←0 ⎕JSON ⎕OPT'Dialect' 'JSON5'⊃⎕NGET file
         :EndIf

         :If ~900⌶0
             :Leave
         :EndIf
         t←20 ⎕DT'Z'
         ⎕DL 1+1800|¯1-t ⍝ wait until next half-hour
     :Else
         …
     :EndTrap
 :EndRepeat

Every half an hour, the thread re-reads schedule.tsv if its modification time has changed. If the round that should be live differs from the one currently being served, then the HTMLInterface property is updated; the corresponding test data is then (re-)loaded if either the test data file or the round has changed. The schedule itself is a two-column TSV mapping UTC start times to round letters:

utc                 dir
…
2026-01-30 09:00    Z
2026-02-11 09:00    G
2026-04-30 08:00    Z
2026-05-01 08:00    C
2026-07-31 08:00    Z
2026-08-02 08:00    H
…

Editing this file is the entire process for scheduling rounds. There’s no separate content repository, and there’s no longer an aspiration to swap content underneath a running site without telling the site about it. The builds tend to take less than 30 seconds, and the live site is swapped with only a few seconds of downtime:

Jenkins build times for the MkDocs-based APL Challenge

A huge improvement over the previous architecture:

Jenkins build times for the MiServer-based APL Challenge

The JSON5 test catalogue

Each round has a JSON5 file with members P1 through P10 describing how to test each problem. The format was inherited from the MiServer-based system; the test framework that consumes it has been completely rewritten. Here’s part of one:

{
    P1: {
        r:"^ *⍳ *\\d+",
        s:"⍳24",
    },
    …
    P7: {
        a: [
            "'HELLO'",
            "'DYALOG'",
            "'APL'",
            "{⍵[?≢⍵]}¨'AEIOU' 'BCDFGHJKLM' 'NPQRSTVWXYZ' 'AEIOU'",
        ],
        f: "{(2⊃⍵),(2⊃⌽⍵)}",
    },
    …
}

Problems 1–6 ask the participant to type one specific expression that produces one specific value, so the specification carries a reference solution s and a regex r that describes the acceptable solutions. Problems 7-10 ask for a function: the specification carries a reference function f, an array of test arguments a, and (optionally) a preprocessor p to apply before comparing the user’s result with the reference’s. Since the JSON5 strings are APL expressions, we can generate random tests as needed to prevent participants from hard-coding answers.

Submitting an answer

JavaScript included in the frontend makes the user’s browser POST [lang, problem, code, email] to the /Submit endpoint. First, we do some sanity checking:

 (rc msg)←req Submit(lang problem code email);anon;T
 lang ⎕C⍨←¯3
 :If ~lang⊂⍛∊##.⎕NL ¯9
     lang←'en'
 :EndIf
 T←lang∘Text
 :If 'Z'≡round
     (rc msg)←8(T'closed')
 :ElseIf ~problem⊂⍤,⍤⍕⍛∊⍕¨⍳10
     (rc msg)←8(T'badProbNo')
 :ElseIf 0∊∊' '=⊃0⍴⊂code
     (rc msg)←8(T'badCode')
 :OrIf 255<≢code←∊code
     (rc msg)←8(T'longCode')
 :ElseIf 0∊∊' '=⊃0⍴⊂email
     email←'^\s+|\s+$'⎕R''∊email
 :OrIf (''≢email)∧⍬≡'\S.*@.*\S'⎕S 3⊢email
 :OrIf email∊⍨⎕UCS 9
 :OrIf 254<≢email
     (rc msg)←¯8(T'badEmail')
 :Else
     …

We then test the submission and add an entry to the sub[mission]s and, if correct, wins database files:

     …
     (rc msg)←'data' 'problem' 'lang'⎕NS⍛Test code
     …
     ((unixtime:20 ⎕DT'Z' ⋄ code:'\t'⎕R'␉'⊢code)⎕NS'problem' 'rc')AppendTSV dbDir,'/subs.tsv'
     :If 0=rc
         req.Server.Log anon,' solved ',round,' problem ',problem
         :Hold 'wins'
             'email' 'problem' 'round'⎕NS⍛AppendTSV dbDir,'/wins.tsv'
         :EndHold
     :EndIf
 :EndIf

Responses consist of a return code and a message, ready for the frontend to render as a Material admonition:

Example admonition

The submission record uses two new Dyalog v20.0 features:

(unixtime:20 ⎕DT'Z' ⋄ code:'\t'⎕R'␉'⊢code)⎕NS'problem' 'rc'

The parenthesised expression is a namespace literal containing unixtime and code with their value expressions, and ⎕NS'problem' 'rc' amends the reference left argument (new!) with copies of those two variables. The result is a four-member namespace, ready to be appended as a TSV row. I chose to pass the row as a namespace rather than an ordered vector so that the table column order wouldn’t matter.

The test harness

Test is the largest function in the codebase (110 lines), but mostly consists of validation; the actual evaluation is short. The function begins with more usage of new Dyalog v20.0 features:

 (rc msg)←params Test code;…
 ⎕THIS ⎕NS params
 :Trap ⎕VGET⊂'debug' 0
     T←lang∘Text
     spec←data ⎕VGET⊂'P',problem
     …

⎕THIS ⎕NS params merges the entire params namespace into the current function’s scope, so data, problem, and lang (set up by the caller in Submit) are now ordinary local names with no params. qualifier required. data ⎕VGET⊂'debug' 0 decides whether to trap unexpected errors at the top level (during development, I set debug←1 from the session), and ⎕VGET then reads the specification for the requested problem out of the test data.

When initial validation is complete, the function proceeds based on the type of specification that was supplied (a regex-and-value check for problems 1-6, or a test-cases-and-function check for 7-10) and (after more specific validation) either runs the user’s code with ()⍎ (to run it in a separate namespace) or composes a small expression that wraps the user’s function and the reference function side-by-side and runs both in turn. (Debugging the generated expression is where inline tracing came in handy for me.) If the answers match, the frontend gets 0 'Your answer is correct. Good job!'. Otherwise the message tells them whether the answer was wrong, the methods used were wrong, the symbols used were wrong, or the code crashed.

Security is delivered by a safe character set, which stops glyphs that are dangerous (, , , ) or can lead to runaway execution (, , ):

safe←'+-×÷⌈⌊*⍟|!○~∨∧⍱⍲<≤=≥>≠.@≡≢⍴,⍪⍳↑↓?⍒⍋⍉⌽⊖∊⊥⊤⌹⊂⊃∪∩⍷⌷∘/⌿\⍀¨⍨⊆⍥⊣⊢⍤⌸⌺⍸()[];⋄:⍛⍬{}⍺⍵¯ '

This, together with limited memory, makes the new system light-weight enough that we can run it in the main thread, thus saving on thread juggling.

Authorisation as a one-liner

There are only two protected endpoints – Table for reading the TSVs, and Purge for deleting records – both for our own administrative use. (Participants don’t authenticate at all; an email goes in with each submission, the browser remembers it client-side, and that’s it. The worst that can happen is that I submit a correct answer with your email address, and that you then get a single notification email about having won a competition you haven’t heard of. Your email is then purged from our systems.) A single dfn, evaluated once for each protected request, is sufficient:

Authorised←{(⊂⍵.GetHeader'AuthToken')∊(⎕VGET⊂'authToken' '' ⋄ Env'AUTHTOKEN')~⊂''}

The valid token is whichever of the variable authToken (set manually for local testing) and the environment variable $AUTHTOKEN (set by Docker from a Jenkins credential) is non-empty. ⎕VGET returns its default of '' if authToken isn’t defined. The request’s AuthToken header is compared against the resulting list.

The “database”

The two storage tables are TSV files. They’re created at server start if they don’t exist. The combination of :For:In with multi-line array notation makes it easy to spot what goes where, whilst preventing the lines from getting too long:

 :For name head :In (
         'subs'('code' 'rc' 'problem' 'unixtime')
         'wins'('email' 'problem' 'round')
     )
     pathfile←dbDir,'/',name,'.tsv'
     :If ~⎕NEXISTS pathfile
         pathfile ⎕NPUT⍨[head ⋄ ]TSV''
     :EndIf
 :EndFor

[head ⋄ ] is array notation for a one-row matrix containing the row head – a header row. TSV is ⎕CSV with the separator pre-set to the Tab character and quotes disabled (Submit already protected us against Tab characters in the submission by replacing them with the Unicode control picture ):

 TSV←⎕CSV ⎕OPT('Separator'(⎕UCS 9) ⋄ 'QuoteChar' '')

subs.tsv contains code, rc, problem, and unixtime – every submission, with no email column. wins.tsv contains email, problem, and round – correct submissions only, with no code column and no timestamp. The two tables share no column that would let you join a participant’s email to the code they typed. We can produce statistics (“how many people solved problem 7 of round G?”), and we can email prize winners, but we can’t, even ourselves, look at a piece of submitted code and say which person wrote it. That’s deliberate.

Appending a record is a one-liner that takes a namespace whose names match the table’s columns, slots the values into a fresh row in the right column order, and asks ⎕NPUT to append (2):

 r←file 2 ⎕NPUT⍨[ns.⎕VGET⊃⌽(TSV ⎕OPT'Records' 1)file ⍬ 4 1 ⋄ ]TSV''

(TSV ⎕OPT'Records' 1)file ⍬ 4 1 reads only the header row (one record), and ns.⎕VGET then plucks the values out of the namespace in that order.

Reading a whole table for the administrative Table endpoint is:

 resp←0 ⎕JSON 1 ⎕JSON⊂2(TSV path ⍬ 4)

Here, we read the file then round-trip through ⎕JSON to leverage a dataset wrapper – introduced in Dyalog v19.0 – that transforms a table into a vector of namespaces. To see what’s happening, here is a sample wins table:

      wins←[
            'email'           'problem' 'round'
            'foo@example.com' 1         'A'
            'bar@example.com' 2         'B'
           ]
      1 ⎕JSON⊂2 wins
[{"email":"foo@example.com","problem":1,"round":"A"},{"email":"bar@example.com","problem":2,"round":"B"}]

Jarvis will respond with JSON data, but does the conversion from APL array for us, so we preempt the double-conversion with 0 ⎕JSON. Yes, it is unnecessary work, but it happens rarely enough not to matter, and even with the round-trip it is faster than the alternative, more-involved, ⊃{()⎕VSET(↑⍵)⍺}⍤1/TSV path ⍬ 4 1.

Behind the scenes with

Dyalog v20.0 introduced the new behind operator, . This uses the left operand to provide a left argument to the right operand:

  • X f⍛g Y is (f X) g Y
  • f⍛g Y is (f Y) g Y

This is a very common pattern, and across the 400-line backend appears over 30 times. Here are some of the representative uses:

New 'data' 'problem' 'lang'⎕NS⍛Test code Make Test take a list of names representing a namespace,
rather than taking a namespace reference
Old (⎕NS'data' 'problem' 'lang')Test code
New lang⊂⍛∊##.⎕NL ¯9 Make look for a single whole text,
rather than for each letter
Old (⊂lang)∊##.⎕NL ¯9
New mask~⍛/what / means “filter in”; “keep the indicated”
~⍛/ means “filter out”; “remove the indicated”
Old (~mask)/what
New (∨/spec.s∘∊)⍛/¨groups ⍛/ is a monadic filtering operator
f⍛/¨Y filters each element of Y by the predicate function f
Old ((∨/spec.s∘∊)¨groups)/¨groups

I especially like how allows me to extend the primitive vocabulary of APL by providing oft-needed variants of existing primitives; X⊂⍛∊Y and X~⍛/Y and f⍛/Y could easily have been primitives in their own right.

Internationalisation

Responses to submissions, including both success and failure messages, as well as reports about internal errors in the system (luckily, we’ve only had one minor error, and it was due to a mistake in the frontend), are sent as plain human-readable text together with a return code. The frontend renders it as a Material-style admonition with the colour and icon determined by the return code. English phrasings live in an array notation namespace in the file en.apla:

(
 and:'and'
 ansCorrect:'Your answer is correct. Good job!'
 ansErr:'Error<p>Your answer caused a'
 ansPass:'Your answer passed all tests. Good job!'
 arg:'argument'
 as:'as'
 badEmail:'Invalid email address'
 …
)

Jarvis loads .apla files together with the other application code. Now, you might have spotted the line :If ~lang⊂⍛∊##.⎕NL ¯9 and thought that this looks error-prone; surely, a malicious actor could issue a POST with the language set to the name of an unrelated namespace! Worry not; the line lang ⎕C⍨←¯3 makes sure only namespaces with entirely lowercase names are reachable as language packs, while all the system’s top-level namespaces begin with a capital letter. Phew, close!

Adding a new language to the backend consists of adding one file. Of course, the problem statements and informational pages will also need translation, but Material for MkDocs supports internationalisation by having a separate directory per language. I only recently finished composing all nine planned rounds of content, and want to harmonise them a bit more before we begin translating, but at least the wiring is there.

Looking back and forwards

The previous stack was a very useful test of the code and tools that we supply: it exercised our libraries, gave us bug reports against our own tools, and kept us using what we were selling. However, it also accumulated technical debt, and as the system evolved, the seams started to show.

The new stack also uses Dyalog technology (Jarvis, HttpCommand, Conga, and most of the newest language features) and these are really the things we’re selling today. It does so much more economically too: the whole thing runs on a single virtual CPU with 512 MB of RAM, whereas the old system used at least two virtual CPUs (more during peaks) with 1 GB of RAM each.

Have a look at Jarvis! It is really easy to get started. Here’s the bare minimum to get from a clear workspace to a running service with a good architecture in a folder /rot (replace with any other folder you want to use instead):

  1. Design your API: We’ll make it really simple; create a single function Rotate←{⊃⌽/⍵} in the workspace.
  2. Create the folder and function source file: ]Create # /rot
  3. Create the Jarvis configuration file: The easiest is ]Repr (CodeLocation:'.' ⋄ HTMLInterface:'.' ⋄ IncludeFns:'Rotate') -f=json -o=/rot/jarvis.json
  4. Create the HTML interface: Here’s one with two input fields, a button, and the minimal JavaScript needed to make the button work:
    <!DOCTYPE html>
    <html>
      <head>
        <title>Jarvis Text Rotator</title>
        <script>
          Exec=()=>{
            fetch("/Rotate",{
              method:"POST",
              headers:{"content-type":"application/json; charset=utf-8"},
              body:JSON.stringify([steps.valueAsNumber, text.value])
            }).then(r=>r.json()).then(d=>out.innerHTML=d)
          }
        </script>
      </head>
      <body>
        <input id=steps placeholder=Steps type=number>
        <button onclick=Exec()>⌽</button>
        <input id=text placeholder=Text>
        <br>
        <output id=out></output>
      </body>
    </html>

    Save this text to /rot/index.html.

  5. Get Jarvis: ]Get github.com/Dyalog/Jarvis/blob/master/Source/Jarvis.dyalog
  6. Start the server: Jarvis.Run'/rot/jarvis.json'
  7. Try it: Open localhost:8080 in your browser, fill in the fields, and click the button!

Try HttpCommand, too! While the above server is running, we can easily use it as a micro-service, bypassing the HTML frontend:

  1. Get HttpCommand: ]Get HttpCommand
  2. Issue the command: resp←HttpCommand.GetJSON'POST' 'localhost:8080/Rotate' (2 'Hello')
  3. Inspect the response: resp.Data – this will give the character vector 'lloHe'

If you’ve made it all the way to here, congratulations! Don’t forget to promote the APL Challenge and the APL Forge – they are available year-round!

Announcing the Beta Programme for Dyalog APL Version 18.2


I am very pleased to be able to announce the start of the Beta testing of the next release of Dyalog APL! As explained in June, we decided to delay the release of version 18.1 in order to take a closer look at some of the optimisations that had been implemented in 18.0 (and were therefore also present in 18.1).

Our analysis of the optimised code concluded that, due to the nature of many of the new algorithms involving modern vector instructions and code generated from templates, we need significantly more time to create tests that will cover absolutely all the different cases that have been implemented. As a result, we do not feel that it is in the best interests of our users to release v18.1 in its current form.

The good news is that the features of modern source code management systems, combined with our collection of regression tests, have allowed us to create a new version of Dyalog APL that contains all of the new features added to versions 18.0 and 18.1. This includes bug fixes made during these two development cycles, but not the optimisations that make us feel uncomfortable. To differentiate this version from the existing version 18.1, we have decided to call the new version 18.2.

Everyone who was signed up for the v18.1 beta programme should now be able to download v18.2 beta. If you are not signed up as a beta tester already but would like to help us with testing, please get in touch. Under Microsoft Windows, testing should be significantly simpler this year, as we have started producing Microsoft Patch files (MSP) as the delivery mechanism for updates – something I have personally been looking forward to since before I joined Dyalog Ltd!

Version 18.2 Performance

One unavoidable consequence of the above is that the performance of v18.2 is closer to that of v17.1 than to v18.0. Our own tests show that we have not been pushed ALL the way back to “square one”: v18.2 appears to be slightly faster than version 17.1. Once v18.2 has been delivered we will work on carefully re-implementing the most valuable optimisations that have been removed. We welcome your input on which primitives you think we should speed up first, so please participate in testing v18.2 and let us know what you think we should prioritise as we start work on version 19.0!

Recommendations regarding Version 18.0

Over the summer, only one additional defect related to performance optimisation was discovered and fixed. We are not currently aware of any outstanding defects in v18.0 caused by recent optimisations. Dyalog Ltd is committed to providing support for version 18.0 until the arrival of the 3rd release following it, in accordance with normal policy.

However, if you have not yet upgraded to v18.0, Dyalog Ltd strongly recommends remaining on your current version and moving directly to v18.2 when it is released. If all goes well, this will happen at the end of 2021 or very early in 2022. If you are already using v18.0, then we recommend that you make plans to start evaluating v18.2 and moving to it as soon as possible.

Conclusion – and Apology

We are painfully aware that the defects found in v18.0 and the resulting uncertainty have seriously inconvenienced some of our users, and I apologise for this. The root cause is the growth and rejuvenation of the Dyalog development team. Our original processes for quality assurance relied on years of tacit knowledge; when enthusiastic new team members break significant new ground, more explicit planning and QA processes are required to make sure that new approaches are safe and stable.

When we resume work on optimisations following the release of v18.2, this will be done according to new guidelines that require the process to begin with a careful risk/benefit analysis of any enhancement to primitive functions. We will do everything that we can to move forward in a way that will allow us all to eventually look back on the events of 2021 as a significant step towards a more capable and reliable development organisation and product.

After all, in another two years it will be time to celebrate 40 years of Dyalog APL!

News and Recommendations Regarding Dyalog Versions 17.1, 18.0 and 18.1

Background

Dyalog version 18.0 contained the largest number of optimised algorithms in the history of Dyalog APL. Unfortunately, since its release, we have found that our process for review and testing of optimisations was insufficient to cope with the quantity and nature of the optimisations. Many of the new algorithms had edge cases that escaped both the existing regression tests and the new tests that were written as part of development.

In August 2020, after the issues were reported following the release of Dyalog version 18.0, we issued a caution against the use of version 18.0 in production. We added significant additional testing during September and October, and subsequently lifted the caution in November. In April 2021, a user encountered an additional defect leading to incorrect results in a rare case of membership (), and in June a crash in an extremely rare case of “where” (monadic ). Since then, our own internal testing has found a defect in interval index (dyadic ).

Current Status

We have made patches available as defects have been detected and fixed, and “Issue 4” of version 18.0, made available to all users on 22 June 2021, includes fixes for all known issues with optimised code.

Download Dyalog version 18.0 Issue 4: Commercial users  Non-commercial users

At this time, we are executing two high priority projects, one aimed at increasing safety in the short term, the other at completely eliminating problematic optimisations within a few months:

  • We have further increased our internal testing of version 18.0, to verify the correctness of the optimisations that are still in the distributed product.
  • We are holding back the release of version 18.1, which was originally scheduled for release this month, while we remove most of the optimisations that went into version 18.0. We hope to release version 18.1 before the end of the 3rd Quarter of 2021, but will not let it out the door until we have complete confidence in this new version.

Recommendations

Dyalog Ltd recommends that users of Dyalog APL take the following action:

  • If you are using version 17.1 or earlier, plan to skip version 18.0 and upgrade directly to version 18.1 when it becomes available.
  • If you are using version 18.0, apply the latest patches or reinstall using Issue 4 when it becomes available. Continue to monitor our DSS e-mail broadcasts and apply patches quickly if we should find and fix any further issues that you think might impact your application. When version 18.1 becomes available, plan to upgrade to it at your earliest convenience.

Support for Versions 17.1 and 18.0

Considering this extraordinary situation, we have decided to extend support for version 17.1 for one additional release cycle. In other words, version 17.1 will be supported until we release the 4th subsequent version of Dyalog APL. Note that we are about to provide an updated version 17.1 installer (“Issue 3”) which collects all updates to version 17.1 since the original release.

Version 18.0 will be supported for the normal number of cycles, that is, until we release the 3rd following version. However, Dyalog Ltd recommends upgrading to version 18.1 as soon as it becomes available and you are able to schedule the upgrade.

Non-Commercial Versions

Given the nature of these defects, it is our intention to update our non-commercial distributions of version 18.0, although this process may lag behind the distribution of patches to clients paying for support by some weeks.

Conclusion

We apologise for the significant inconvenience that we know this is causing for some of you and thank you for your patience. We have made significant changes to our internal procedures regarding risk assessment, verification, and testing requirements for changes to existing primitive functions, to avoid a recurrence.

We do expect to re-apply those optimisations that we believe provide significant performance benefits to justify the risk of making changes over the next few releases, using new processes.

Sincerely,

Morten Kromberg
CTO, Dyalog Ltd.

Towards Improvements to Stencil

Background

The stencil operator was introduced in Dyalog version 16.0 in 2017. Recently we received some feedback (OK, complaints) that (a) stencil does padding which is unwanted sometimes and needs to be removed from the result and (b) stencil is too slow when it is not supported by special code.

First, stencil in cases supported by special code is much faster than when it is not. The special cases are as follows, from Dyalog ’17 Workshop SA3.

   {⍵}      {⊢⍵}      {,⍵}      {⊂⍵}
{+/,⍵}    {∧/,⍵}    {∨/,⍵}    {=/,⍵}    {≠/,⍵}  
    
{  +/,A×⍵}    {  +/⍪A×⍤2⊢⍵}
{C<+/,A×⍵}    {C<+/⍪A×⍤2⊢⍵}
C:  a single number or variable whose value is a single number
A:  a variable whose value is a rank-2 or 3 array
The comparison can be < ≤ ≥ > = ≠
odd window size; movement 1; matrix argument
 

You can test whether a particular case is supported by using a circumlocution to defeat the special case recognizer.

   )copy dfns cmpx

   cmpx '{⍵}⌺3 5⊢y' '{⊢⊢⍵}⌺3 5⊢y' ⊣ y←?100 200⍴0
  {⍵}⌺3 5⊢x   → 4.22E¯4 |      0%                               
  {⊢⊢⍵}⌺3 5⊢x → 5.31E¯2 | +12477% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

   cmpx '{⌽⍵}⌺3 5⊢y' '{⊢⊢⌽⍵}⌺3 5⊢y'
  {⌽⍵}⌺3 5⊢y   → 2.17E¯1 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  {⊢⊢⌽⍵}⌺3 5⊢y → 2.21E¯1 | +1% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

If the timings are the same then there is no special code.

Padding and performance improvements will take a lot of work. For example, for padding (i.e. the treatment of cells at the edge of the universe) multiple options are possible: no padding, padding, wrap from opposite edge, etc. While working on these improvements I hit upon the idea of writing a stencil function which produces the stencil cells. It only works with no padding and only for movements of 1 (which I understand are common cases), but turns out to be an interesting study.

A Stencil Function

⍺ stencell ⍵ produces the stencil cells of size from  , and is equivalent to {⍵}⌺⍺⊢⍵ after the padded cells are removed.

stencell←{
  ⎕io←0                 ⍝ ⎕io delenda est!
  s←(≢⍺)↑⍴⍵
  f←1+s-⍺               ⍝ frame AKA outer shape
  m←⊖×⍀⊖1↓s,1           ⍝ multiplier for each axis
  i←⊃∘.+⌿(m,m)×⍳¨f,⍺    ⍝ indices
  (⊂i) ⌷ ⍵ ⍴⍨ (×⌿(≢⍺)↑⍴⍵),(≢⍺)↓⍴⍵
}

For example, stencell is applied to x with cell shape 3 5 .

   ⊢ x←6 10⍴⍳60                    ⍝ (a)
 0  1  2  3  4  5  6  7  8  9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59

   c←3 5 stencell x                ⍝ (b)
   ⍴c
4 6 3 5

   c ≡ 1 2 ↓ ¯1 ¯2 ↓ {⍵}⌺3 5 ⊢x    ⍝ (c)
1

   ⊢ e←⊂⍤2 ⊢c                      ⍝ (d)
┌──────────────┬──────────────┬──────────────┬──────────────┬──────────────┬──────────────┐
│ 0  1  2  3  4│ 1  2  3  4  5│ 2  3  4  5  6│ 3  4  5  6  7│ 4  5  6  7  8│ 5  6  7  8  9│
│10 11 12 13 14│11 12 13 14 15│12 13 14 15 16│13 14 15 16 17│14 15 16 17 18│15 16 17 18 19│
│20 21 22 23 24│21 22 23 24 25│22 23 24 25 26│23 24 25 26 27│24 25 26 27 28│25 26 27 28 29│
├──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│10 11 12 13 14│11 12 13 14 15│12 13 14 15 16│13 14 15 16 17│14 15 16 17 18│15 16 17 18 19│
│20 21 22 23 24│21 22 23 24 25│22 23 24 25 26│23 24 25 26 27│24 25 26 27 28│25 26 27 28 29│
│30 31 32 33 34│31 32 33 34 35│32 33 34 35 36│33 34 35 36 37│34 35 36 37 38│35 36 37 38 39│
├──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│20 21 22 23 24│21 22 23 24 25│22 23 24 25 26│23 24 25 26 27│24 25 26 27 28│25 26 27 28 29│
│30 31 32 33 34│31 32 33 34 35│32 33 34 35 36│33 34 35 36 37│34 35 36 37 38│35 36 37 38 39│
│40 41 42 43 44│41 42 43 44 45│42 43 44 45 46│43 44 45 46 47│44 45 46 47 48│45 46 47 48 49│
├──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│30 31 32 33 34│31 32 33 34 35│32 33 34 35 36│33 34 35 36 37│34 35 36 37 38│35 36 37 38 39│
│40 41 42 43 44│41 42 43 44 45│42 43 44 45 46│43 44 45 46 47│44 45 46 47 48│45 46 47 48 49│
│50 51 52 53 54│51 52 53 54 55│52 53 54 55 56│53 54 55 56 57│54 55 56 57 58│55 56 57 58 59│
└──────────────┴──────────────┴──────────────┴──────────────┴──────────────┴──────────────┘

    ∪¨ ,¨ e-⍬⍴e                    ⍝ (e)
┌──┬──┬──┬──┬──┬──┐
│0 │1 │2 │3 │4 │5 │
├──┼──┼──┼──┼──┼──┤
│10│11│12│13│14│15│
├──┼──┼──┼──┼──┼──┤
│20│21│22│23│24│25│
├──┼──┼──┼──┼──┼──┤
│30│31│32│33│34│35│
└──┴──┴──┴──┴──┴──┘
(a)  The matrix x is chosen to make stencil results easier to understand.
(b) stencell is applied to x with cell shape 3 5 .
(c) The result of stencell is the same as that for {⍵}⌺ after cells with padding are dropped.
(d) Enclose the matrices in c (the cells) to make the display more compact and easier to understand.
(e) Subsequent discussion is based on the observation that each cell is some scalar integer added to the first cell.

Indices

The key expression in the computation is

   ⊃∘.+⌿(m,m)×⍳¨f,⍺ 

where

   m:  10 1;  multiplier for each axis
   f:  4 6;  multiplier for each axis
   ⍺:  3 5;  multiplier for each axis
 

We discuss a more verbose but equivalent version of this expression,

   (⊃∘.+⌿m×⍳¨f)∘.+(⊃∘.+⌿m×⍳¨⍺)

and in particular the right half, ⊃∘.+⌿m×⍳¨⍺ , which produces the first cell.

   ⍳⍺               ⍝ ⍳3 5
┌───┬───┬───┬───┬───┐
│0 0│0 1│0 2│0 3│0 4│
├───┼───┼───┼───┼───┤
│1 0│1 1│1 2│1 3│1 4│
├───┼───┼───┼───┼───┤
│2 0│2 1│2 2│2 3│2 4│
└───┴───┴───┴───┴───┘
   (⍴⍵)∘⊥¨⍳⍺        ⍝ 6 10∘⊥¨ ⍳3 5
 0  1  2  3  4
10 11 12 13 14
20 21 22 23 24

Alternatively, this last result obtains by multiplying by m the corresponding indices for each axis, where an element of m is the increment for a unit in an axis. That is, m←⊖×⍀⊖1↓s,1 where s←(≢⍺)↑⍴⍵ is a prefix of the shape of  . The multipliers are with respect to the argument because the indices are required to be with respect to the argument  .

   ⍳¨⍺              ⍝ ⍳¨3 5
┌─────┬─────────┐
│0 1 2│0 1 2 3 4│
└─────┴─────────┘
   m×⍳¨⍺            ⍝ 10 1×⍳¨3 5
┌───────┬─────────┐
│0 10 20│0 1 2 3 4│
└───────┴─────────┘
   ∘.+⌿ m×⍳¨⍺       ⍝ ∘.+⌿ 10 1×⍳¨3 5
┌──────────────┐
│ 0  1  2  3  4│
│10 11 12 13 14│
│20 21 22 23 24│
└──────────────┘
   ((⍴⍵)∘⊥¨⍳⍺) ≡ ⊃∘.+⌿m×⍳¨⍺
1

This alternative computation is more efficient because it avoids creating and working on lots of small nested vectors and because the intermediate results for ∘.+⌿ grows fast from one to the next (i.e., O(⍟n) iterations in the main loop).

The left half, ⊃∘.+⌿m×⍳¨f , is similar, and computes the necessary scalar integers to be added to the result of the right half.

   ⊃ ∘.+⌿ m×⍳¨f     ⍝ ⊃ ∘.+⌿ 10 1×⍳¨4 6
 0  1  2  3  4  5
10 11 12 13 14 15
20 21 22 23 24 25
30 31 32 33 34 35

The shorter expression derives from the more verbose one by some simple algebra.

(⊃∘.+⌿m×⍳¨f)∘.+(⊃∘.+⌿m×⍳¨⍺)    ⍝ verbose version
⊃∘.+⌿(m×⍳¨f),m×⍳¨⍺             ⍝ ∘.+ is associative
⊃∘.+⌿(m,m)×(⍳¨f),⍳¨⍺           ⍝ m× distributes over ,
⊃∘.+⌿(m,m)×⍳¨f,⍺               ⍝ ⍳¨ distributes over ,

I am actually disappointed that the shorter expression was found ☺; it would have been amusing to have a non-contrived and short expression with three uses of ∘.+ .

Cells

Having the indices i in hand, the stencil cells obtain by indexing into an appropriate reshape or ravel of the right argument  . In general, the expression is

   (⊂i) ⌷ ⍵ ⍴⍨ (×/(≢⍺)↑⍴⍵),(≢⍺)↓⍴⍵

specifies the cell shape. If (≢⍺)=≢⍴⍵ , that is, if a length is specified for each axis of  , the expression is equivalent to (⊂i)⌷,⍵ or (,⍵)[i] ; if (≢⍺)<≢⍴⍵ , that is, if there are some trailing unstencilled axes, the expression is equivalent to (,[⍳≢⍺]⍵)[i;…;] (the leading ≢⍺ axes are ravelled) or ↑(,⊂⍤((≢⍴⍵)-≢⍺)⊢⍵)[i] (as if the trailing axes were shielded from indexing). The general expression covers both cases.

Application

stencell makes it possible to workaround current shortcomings in . The alternative approach is to use stencell to get all the stencil cells, all at once, and then work on the cells using  , +.× , and
other efficient primitives.

The following example is from Aaron Hsu. In the original problem the size of x is 512 512 64 .

   K←?64 3 3 64⍴0
   x←?256 256 64⍴0

   t←1 1↓¯1 ¯1↓{+/⍪K×⍤3⊢⍵}⌺3 3⊢x
   ⍴t
256 256 64

   cmpx '1 1↓¯1 ¯1↓{+/⍪K×⍤3⊢⍵}⌺3 3⊢x'
6.76E0 

The computation is slow because the cells are rank-3, not supported by special code. Aaron then devised a significant speed-up using a simpler left operand to create the ravels of the cells (but still no special code):

   t ≡ (1 1↓¯1 ¯1↓{,⍵}⌺3 3⊢x)+.×⍉⍪K
1
   cmpx '(1 1↓¯1 ¯1↓{,⍵}⌺3 3⊢x)+.×⍉⍪K'
1.67E0 

Use of stencell would improve the performance a bit further:

   t ≡ (,⍤3 ⊢3 3 stencell x)+.×⍉⍪K
1
   cmpx '(,⍤3 ⊢3 3 stencell x)+.×⍉⍪K'
1.09E0 

   cmpx '3 3 stencell x'
6.10E¯2

The last timing shows that the stencell computation is 6% (6.10e¯2÷1.09e0) of the total time.

Materializing all the cells does take more space than if the computation is incorporated in the left operand of  , and is practicable only if the workspace sufficiently large.

   )copy dfns wsreq

   wsreq '1 1↓¯1 ¯1↓{+/⍪K×⍤3⊢⍵}⌺3 3⊢x'
110649900
   wsreq '(1 1↓¯1 ¯1↓{,⍵}⌺3 3⊢x)+.×⍉⍪K'
647815900
   wsreq '(,⍤3 ⊢3 3 stencell x)+.×⍉⍪K'
333462260

Performance

stencell is competitive with {⍵}⌺ on matrices, where it is supported by special code written in C, and is faster when there is no special code. The benchmarks are done on a larger argument to reduce the effects of padding/unpadding done in {⍵}⌺ .

   y2←?200 300⍴0
          
   cmpx '3 5 stencell y2' '1 2↓¯1 ¯2↓{⍵}⌺3 5⊢y2' 
  3 5 stencell y      → 1.85E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕           
  1 2↓¯1 ¯2↓{⍵}⌺3 5⊢y → 2.91E¯3 | +57% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

   cmpx '3 5 stencell y' '{⍵}⌺3 5⊢y' 
  3 5 stencell y → 1.85E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
* {⍵}⌺3 5⊢y      → 1.04E¯3 | -45% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕             

   y3←?200 300 64⍴0

   cmpx '3 5 stencell y3' '1 2↓¯1 ¯2↓{⍵}⌺3 5⊢y3' 
  3 5 stencell y3      → 8.90E¯2 |    0% ⎕⎕⎕                           
  1 2↓¯1 ¯2↓{⍵}⌺3 5⊢y3 → 7.78E¯1 | +773% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

   cmpx '3 5 stencell y3' '{⍵}⌺3 5⊢y3' 
  3 5 stencell y3 → 9.38E¯2 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕                      
* {⍵}⌺3 5⊢y3      → 3.34E¯1 | +256% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

There is an interesting question of whether the shorter version of the key computation (in the Indices section above) is faster than the more verbose version.

   m←10 1 ⋄ f←4 6 ⋄ a←3 5

   cmpx '⊃∘.+⌿(m,m)×⍳¨f,a' '(⊃∘.+⌿m×⍳¨f)∘.+(⊃∘.+⌿m×⍳¨a)'
  ⊃∘.+⌿(m,m)×⍳¨f,a            → 3.75E¯6 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
  (⊃∘.+⌿m×⍳¨f)∘.+(⊃∘.+⌿m×⍳¨a) → 5.20E¯6 | +38% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

In this case, it is faster, and I expect it will be faster for cases which arise in stencil calculations, where the argument size is larger than the cell size. But it is easy to think of arguments where ∘.+⌿ is slower than ∘.+ done with a different grouping:

   cmpx '((⍳0)∘.+⍳100)∘.+⍳200' '(⍳0)∘.+((⍳100)∘.+⍳200)' '⊃∘.+/⍳¨0 100 200'
  ((⍳0)∘.+⍳100)∘.+⍳200   → 7.86E¯7 |     0% ⎕⎕                            
  (⍳0)∘.+((⍳100)∘.+⍳200) → 1.05E¯5 | +1234% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕  
  ⊃∘.+/⍳¨0 100 200       → 1.11E¯5 | +1310% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

This question will be explored further in a later post.

Tolerated Comparison, Part 2

This post might be more difficult than our usual fare. You won’t find any interesting APL coding insights here—instead we’ll be focused on the tricky topic of floating-point error analysis. If that’s not your thing, feel free to skip this one! Although if you plan to use tolerated comparison in a real application, you really do need to know this stuff.

In Tolerated Comparison, Part 1, I discussed the structure of tolerant inequality with one argument fixed, and showed that

  • For any real number B, there’s another number b so that a number is tolerantly less than or equal to B if and only if it is intolerantly less than or equal to b.
  • This number is equal to B÷1-⎕CT when B<0, and B×1-⎕CT otherwise.

But these results were proven only for mathematical real numbers, which have many properties among which is the complete inability to be implemented in a silicon chip. To actually apply the technique in Dyalog APL, we must know that it works for IEEE floats like Dyalog uses (we have not implemented tolerated comparison for the decimal floating point numbers used when ⎕FR is 1287, and there are serious concerns regarding precision which might make it impossible to tolerate values efficiently).

Why should we care if a tolerated value is off by one or a few units in the last place? It’s certainly unlikely to cause widespread chaos. But we think programmers should be able to expect, for instance, that after setting i←v⍳x it is always safe to assume that v[i]=x. A language that behaves otherwise can easily cause “impossible” bugs in programs that are provably correct according to Dyalog’s specification. And finding a value that lies just on the boundary of equality with x is not as obscure an issue as it may appear. With the default value ⎕CT←1E¯14, there are at most about 180 numbers which are tolerantly equal to a typical floating-point number x. So it’s not much of a stretch to think that a program which handles a lot of similar values will eventually run into a problem with an inaccurate version of tolerated equality. And this is a really scary problem to debug—even the slightest difference in the values used would make it disappear, frustrating any efforts to track down the cause. We’ve dealt with tolerant comparison issues in the past and this kind of problem is certainly not something we want to stumble on in the future.

On to floating-point numbers. I’m afraid this is not a primer on the subject, although I can point any interested readers to the excellent What Every Computer Scientist Should Know About Floating-Point Arithmetic. In brief, Dyalog’s floating-point numbers use 64 bits to represent a particular set of real numbers chosen to cover many orders of magnitude and to satisfy some nice mathematical properties. We need to know only a surprisingly small number of things about these numbers, though—see the short list below. Here we consider only normal numbers, and not denormal numbers, which appear at extremely small magnitudes. The important result of this post is still valid for denormal numbers, which have higher tolerance for error than normal numbers, but we will not demonstrate this detail here.

Definitions: In the discussion below, q is used as a short name for the value ⎕CT. Unless stated otherwise, formulas below are to be interpreted not as floating-point calculations but as mathematical expressions—there is no rounding and all comparisons in formulas are intolerant. Evaluation order follows APL except that = is used as in mathematics: it has lower precedence and can be used multiple times in chains to show that many values are all equal to each other. The word “error” indicates absolute error, that is, the absolute distance of a computed value from some desired value. The value ulp (from “Unit in the Last Place”) is used to indicate what some might denote ULP(1), the distance from 1 to the next higher floating point number. It is equal to 2*¯52, and it is an upper bound on the error between two adjacent normal floating-point numbers divided by the smaller of their magnitudes.

We will require the following facts about floating point numbers:

  1. Two adjacent (normal, nonzero) floating-point numbers a and b differ by at least 0.5×(|a)×ulp and at most (|a)×ulp.
  2. Consequently, the error introduced by exact rounding in a computation whose exact result is x is at most (|x)×0.5×ulp. The operations +-×÷ are all exactly rounded.
  3. Sterbenz’s lemma: If x and y are two floating-point numbers with x≤2×y and y≤2×x, then the difference x-y is exactly equal to a floating-point number. Theorem 11 in the link above is closely related, and its proof indicates how one would prove this fact.
  4. Given a floating-point number, the next lower or next higher number can be efficiently computed (in fact, provided the initial number is nonzero, their binary representations differ from that number by exactly 1 when considered as 64-bit integers).

We’ll need one other fact, which Dyalog APL guarantees (other APLs might not). The maximum value of ⎕CT is 2*¯32, chosen so that two 32-bit integers can’t be tolerantly equal to each other. Otherwise, integers couldn’t be compared using the typical CPU instructions, which would be a huge performance problem. The value of ulp is 2*¯52 for IEEE doubles, so ⎕CT*2 is at most ulp÷2*12. The proof below holds for ⎕CT*2 as high as ulp÷9, but not for ⎕CT*2 higher than ulp÷8.

Our task

In the following discussion, we will primarily consider the case B>0. We want to define a function tolerateLE which, given B, returns the greatest floating-point value tolerantly less than or equal to B, and to show that every value smaller than tolerateLE B is also tolerantly less than or equal to B. The last post analysed this situation on real (not floating-point) numbers, and showed that in that case tolerateLE B is equal to B÷1-q.

The case B<0 is substantially simpler to analyse, because the formula B×1-q for this case is more tractable. This case is not described fully but can be handled using the same techniques. Also not included is the case B=0. tolerateLE 0 is zero, since comparison with zero is already intolerant.

Error analysis: B÷1-q

(This section isn’t necessary for our proof. But it’s useful to see why the obvious formula isn’t good enough, and serves as a nice warmup before the more difficult computations later.)

When we compute B÷1-q on a computer, how does that differ from the result of computing B÷1-q using the mathematician’s technique of not actually computing it? There are two operations here, and each is subject to floating-point rounding afterwards. To compute the final error we must use an alternating procedure: for each operation, first find the greatest error that could happen if the operation was computed exactly, based on the error in its arguments. Then add another error term for rounding, which is based on the size of the operation’s result.

It’s helpful to know first how inverting a number close to 1 affects its error. Suppose x is such a number, and it has a maximum error x×r. We’ll get the largest possible error by comparing y÷x×1-r to the exact value y÷x (you can verify this by re-doing the calculation below using 1+r instead). The error is

err = | (y÷x) - y÷x×1-r
    = (y÷x) × | 1 - ÷1-r
    = (y÷x) × | r÷1-r

Assuming r<0.5, which will be wildly conservative for our uses, we know that (1-r)>0.5 and hence (÷1-r)<2. So if the absolute error in x is at most x×r, then the absolute error in y÷x (assuming y is exact, and before any rounding) is at most:

err < (y÷x) × 2×r

Now we can figure out the error when evaluating B÷1-q. At each step the rounding error is at most 0.5×ulp times the current value.

⍝computation     error before rounding     error after rounding
1-q              0                         (1-q)×0.5×ulp
B÷1-q            (B÷1-q) × 2×0.5×ulp       (B÷1-q)×1.5×ulp

The actual upper bound on error has a coefficient substantially less than 1.5, since the error estimate for B÷1-q was very conservative. But the important thing is that it’s definitely greater than 1. The value we compute could be one of the two closest to B÷1-q, but it could also be further out. Obviously we can’t guarantee this is the exact value that tolerateLE B should return. But what kind of bounds can we set on that value, anyway?

Evaluating tolerant inequality

The last post showed that, when B>0, a value a is tolerantly less than or equal to B if and only if it is exactly less than or equal to B÷1-q. But that was based on perfectly accurate real numbers. What actually happens around this value for IEEE floats? Let’s say B is some positive floating-point number and at is the exact value of B÷1-q (which might not be a floating-point number). Then suppose a is another floating-point number, and define e (another possibly-non-floating-point number) so that a = at+e. What is the result of evaluating the tolerant less-than formula below?

(a-B) ≤ q × 0⌈a⌈-B

The left-hand side turns out to be very easy to analyse due to Sterbenz’s lemma, which states that if x and y are two floating-point numbers with x≤2×y and y≤2×x, then the difference x-y is exactly equal to a floating-point number, meaning that it will not be rounded at all when it is computed. It’s easy to show that if a>2×B then a is tolerantly greater than B, and that if B>2×a then a is tolerantly less than or equal to B. So in the interesting case, where a is close to B, we know that the following chain of equalities holds exactly:

a-B = e + at-B
    = e + (B÷1-q)-B
    = e + B×(÷1-q)-1
    = e + B×q÷1-q

Now what about the right-hand side? Because B>0 and (by our simplifying assumption in the previous paragraph) a≥B÷2, a is the largest of the three numbers in 0⌈a⌈-B. Floating-point maximum is always exact (since it’s equal to one of its arguments), so the right-hand side simplifies to q×a. This expression does end up rounding. Its value before rounding can be expressed in terms of a-B and e:

q×a = (q×at) + q×e
    = (B×q÷1-q) + q×e
    = (e + B×q÷1-q) - (e - q×e)
    = (a-B) - e×1-q

It’s very helpful here to know that a-B is exactly a floating-point number! q×a will round to a value that is smaller than a-B (thus making the tolerant inequality a≤B come out false) when it is closer to the next-smallest floating-point number than to a-B (if it is halfway between, it could round either way depending on the last bit of a-B). This happens as long as e×1-q is larger than half the distance to that predecessor. The floating-point format guarantees that, as long as a-B is a normal number, this distance is between 0.25×ulp×a-B and 0.5×ulp×a-B, where ulp is the difference between 1 and the next floating-point number. Consequently, if e is less than 0.25×ulp×a-B, we are sure that a will be found tolerantly less than or equal to B, and if e is greater than 0.5×ulp×a-B, it won’t be. If it falls in that range, we can’t be sure.

The zone of uncertainty for the value B←2*÷5 is illustrated above. It contains all the values of a for which we can’t say for sure whether a is tolerantly less than or equal to B, or greater, without actually doing the computation and rounding (that is, the result will depend on specifics of the floating-point format and not just ulp). It’s very small! It will almost never contain an actual floating point value (one of the black ticks), but it could.

If there isn’t a floating point number in the zone of uncertainty, then tolerateLE B has to be the first floating point number to its left. But if there is one, say c, then the value depends on whether c is tolerantly less than or equal to B: if it is, then c = tolerateLE B. If not, then that obviously can’t be the case, and tolerateLE B is again the nearest floating point value to the left of the zone of uncertainty.

Error analysis: B+q×B

How can we compute B÷1-q more accurately than our first try? One good way of working with the expression ÷1-x when x is between 0 and 1 is to use its well-known expansion as in infinite polynomial. A mathematically-inclined APLer (who prefers ⎕IO←0) might write

(÷1-x) = +/x*⍳∞

where the right-hand side represents the infinite series 1+x+x²+x³+…. One fact that seems more obvious when thinking about the series than about the reciprocal is that, defining z←÷1-x, we know z = 1+x×z. So similarly,

(B÷1-q) = B+q×B÷1-q

But it turns out to be much easier than that! The difference between 1 and ÷1-q is fairly close to q. So if we replace ÷1-q by 1, then we end up off by about B×q×q. Knowing that q*2 is much smaller than ulp, we see that this difference is miniscule compared to B. So why don’t we try the expression B+q×B?

The error in using B instead of B÷1-q is

(|B - B÷1-q) = |B × 1-÷1-q
             = |B × ((1-q)-1)÷1-q
             = B × q÷1-q

Multiplying by q, the absolute error of q×B is q×B × q÷1-q, which, knowing that (÷1-q)<2, is less than B × 2×q*2, and consequently less than, say, B×0.05×ulp.

⍝computation   relative to    err before rounding   err after rounding
q×B            q×B÷1-q        B×0.05×ulp            B×(0.05+q)×ulp
B+q×B          B÷1-q          B×0.06×ulp

That’s pretty close: the unrounded error is substantially less than the error that will be introduced by the final rounding (about B×0.5×ulp). Chances are, it’s the closest floating point number to B÷1-q. But it could wind up on either side of that value, so we will need to perform a final adjustment to obtain tolerateLE B.

Note that the new formula B+q×B is very similar to the formula B×1-q which is used when B is negative. In fact, calculating the latter value with the expression B+q×-B will also have a very low error. That means we can use B+q×|B for both cases! However, we will still need to distinguish between them when testing whether the value that we get is actually tolerantly less than or equal to B.

Polishing up

After we calculate a←B+q×B, we still don’t know which way a≤B will go. There’s just too much error to make sure it falls on one side or the other of the critical band. But we do know about the numbers just next to it: a value adjacent to a must be separated from the unrounded value of B+q×B by at least 0.25×B×(1+q)×ulp, or else we would have rounded a towards it. That unrounded value differs from the true value B÷1-q by only 0.06×B×ulp at most, so we know that these neighbors are at least ((0.25×1+q)-0.06)×B×ulp or (rounding down some) 0.15×B×ulp from at. But that’s way outside of the zone of uncertainty, which goes out only to 0.5×ulp×a-B, since a-B is somewhere around q×B.

So we know that the predecessor to a must be tolerantly less than or equal to B, and its sucessor must not be. That leaves us with only two possibilities: either a is tolerantly less than or equal to B, in which case it is the largest floating-point number with this property, or it isn’t, in which case its predecessor is that number. In the diagram above, we can see that the range for a is a little bigger than the gap between ticks, but it’s small enough that the ranges for its predecessor P(a) and successor S(a) don’t overlap with B÷1-⎕CT or the invisibly small zone of uncertainty to its right. In this case a rounds left, so a = tolerateLE B, but if it rounded right, then we would have (P(a)) = tolerateLE B.

So that’s the algorithm! Just compute B+q×|B, and compare to see if it is tolerantly less than or equal to B. If it is, return it, and otherwise, return its predecessor, the next floating point number in the direction of negative infinity. We also add checks to the Dyalog interpreter’s debug mode to make sure the number returned is actually tolerantly less than or equal to B, and that the next larger one isn’t.

APL model

The following code implements the ideas above in APL. Note that it can give a domain error for numbers near the edges of the floating-point range; Dyalog’s internal C implementation has checks to handle these cases properly. adjFP does some messy work with the binary representation of a floating-point value in order to add or subtract one from the integer it represents. Once that’s out of the way, tolerated inequalities are very simple!

⍝ Return the next smaller floating-point number if ⍺ is ¯1, or the
⍝ next larger if ⍺ is 1 (default).
⍝ Not valid if ⍵=0.
adjFP ← {
  ⍺←1 ⋄ x←(⍺≥0)≠⍵≥0
  bo←,∘⌽(8 8∘⍴)⍣(~⊃83 ⎕DR 256) ⍝ Order bits little-endian (self-inverse)
  ⊃645⎕DR bo (⊢≠¯1↓1,(∧\x≠⊢)) bo 11⎕DR ⊃0 645⎕DR ⍵
}

⍝ Tolerate the right-hand side of an inequality.
⍝ tolerateLE increases its argument while tolerantGE decreases it.
⍝ tolerantEQ returns the smallest and largest values equal to its argument.
tolerateLE ← { ¯1 adjFP⍣(t>⍵)⊢ t←⍵+⎕ct×|⍵ }
tolerateGE ← -∘tolerateLE∘-
tolerateEQ ← tolerateGE , tolerateLE

We can see below that tolerateEQ returns values which are tolerantly equal to the original argument, but which are adjacent to other values that aren’t.

      (⊢=tolerateEQ) 2*÷5
1 1
      (⊢=¯1 1 adjFP¨ tolerateEQ) 2*÷5
0 0

Of course, using tolerateEQ followed by intolerant comparison won’t speed anything up in version 17.0: that’s already been done!

Dyalog ’18 Videos, Week 3

The four presentations from Dyalog’18 that we are releasing this week address both the visible (user interface) and invisible (performance) parts of application design. Starting with performance:

“You don’t have to be an engineer to be a racing driver, but you do have to have Mechanical Sympathy.” – former Formula One racing driver Sir John Young “Jackie” Stewart, OBE


This quote was at the heart of the talk by our invited keynote speaker Martin Thompson. In order to write software which performs well, you need to have a basic understanding of how the underlying machinery works. Understanding basic mathematical models for the theoretical throughput of software and hardware helps us take the step from being alchemists to scientists, as we endeavour to write high-performance systems.

Martin takes us for an entertaining stroll through the evolution of modern processors, and some of the maths behind high performance systems. The good news is that systems which make sequential and predictable memory accesses are likely to find sympathy with modern hardware…

Marshall Lochbaum, the most recent addition to the core interpreter team at Dyalog, followed up with a talk on a number of his ideas for increasing the mechanical sympathy of Dyalog APL, to take maximum advantage of branch prediction and other features of modern processors. Some strategies take advantage of runtime inspection of the arguments, something that is more natural in an interpreter with the ability to dynamically select data types, as opposed to strongly typed strategies typically employed by compilers.


TamStat is an application which helps students Tame Statistics. In two talks at Dyalog’18, Stephen Mansour and Michael Baas focus on two different aspects of the user experience. In the first talk, Stephen focuses on the notation available to users of TamStat. Where many statistical libraries contain dozens of strangely named functions with a variety of switches and parameters, TamStat uses a small set of functions, combined with another small set of operators, to provide a very simple but extremely elegant notation for computing probabilities based on a wide variety of distributions. For example:

⍝ Probability that 7 coin flips (0.5 specifying a "fair" coin) will result 
⍝ in at least 3 heads:
7 0.5 binomial probability ≥ 3
⍝ Probability that a number from a normal distribution with a mean of 0 and 
⍝ standard deviation of 1 will be ≤ 3:
0 1   normal   probability ≤ 3

I almost wish I could go back to University and start Statistics 101 again ?.


Notation is a powerful tool of thought, but graphs make it easier to visualise the results. Following Stephen’s talk, Michael Baas describes work that Dyalog is doing in collaboration with Stephen, with the goal of wrapping TamStat in a modern, HTML/JavaScript based frontend. Current TamStat is based on the ⎕WC (Window Create) library function and is therefore restricted to running on Microsoft Windows. However, many of Stephen’s students use Mac or Linux laptops. The new interface also makes it possible to run TamStat as a web-based service with a web site. We expect that this work will make TamStat accessible to a much wider audience.

Summary of this week’s videos: