Name Colouring for Dfns

APL is sometimes criticised because expressions that include names cannot, in general, be parsed without knowing whether the names represent functions or variables. For example, the name thing in the expression thing⍳3 could reference an array (in which case the is dyadic) or it could reference a function (making the monadic).

An APL expression becomes completely parsable if we distinguish each name with one of four colours, depending on the “kind” of its referent: array, function, monadic operator, dyadic operator. Now, with a bit of practice, we can at least parse thing⍳3 vs thing⍳3 without recourse to the definition of thing. Notice how kind differs slightly from APL’s name-class, which does not distinguish monadic from dyadic operators.

Names whose definitions are external to the code under consideration and so cannot be classified would be given a distinct colour, say red, which would at least draw attention to them. Colouring a number of interdependent functions at the same time should help with such issues.

Name-colouring can co-exist with the more superficial token-colouring ( green for comments and so forth) though we would probably want to configure the two schemes separately.

There’s a related argument for colouring parentheses to show the kind of value they contain: (~∘(⊂'')). This would mean that we could always determine the valency of a function call, or the kind of a hybrid token, such as /, by looking at the colour of the single token immediately to its left. Finally, we should probably kind-colour curly braces: {}, {⍺⍺ ⍵}, {⍵⍵ ⍵}.

Yeah but how?

In the following, where appropriate, “function” is a shorthand for “function or operator”.

Most generally, we want to process a number of definitions at the same time, so that inter-function references can be coloured. Such a set of functions may be considered as internal definitions in an anonymous outer function so, without loss of generality, we need consider only the case of a single multi-line nested function.

A function can be viewed as a tree, with nested subfunctions as subtrees. The definitions at each lexical level in the function comprise the information stored at each node of the tree.

The colouring process traverses the tree, accumulating a dictionary of (name kind) pairs at each level. At a particular level, definitions are processed in the same order as they would be executed, so that the (name kind) entries from earlier definitions are available to inform later expressions. Visiting each node before traversing its subtrees ensures that outer dictionary entries are available to lexically inner functions. Prefixing dictionary entries for inner functions ensures that a left-to-right search (à la dyadic iota) finds those names first, thus modelling lexical name-shadowing.

Each assignment expression adds one or more names to the dictionary. The kind of the assignment is inferred by looking at the defining expression to the right of the assignment arrow. This sounds heavy, but if we examine the expression from right-to-left then we can often stop after examining very few tokens. For example, an arbitrarily long expression ending in (… function array) must, if it is syntactically correct, reduce to an array value, while (… function function) must be a function (train). A sequence in braces {…} resolves to a function, monadic operator or dyadic operator, depending only on whether there are occurrences of ⍺⍺ or ⍵⍵ at its outermost level.

This approach, while relatively simple, is not perfect. It takes no account of the order in which names are defined relative to their being referenced. The assumption is that all definitions at an outer level are available to its inner levels. The following small function illustrates the problem:

    {
        M{A}      ⍝ this A correctly coloured function
        N{A}⍵     ⍝ this A should be coloured unclassified
        A←÷          ⍝ A defined as a function
        M N          ⍝ application of function to array
    }

It remains to be seen whether this would be a problem in practice or whether, if name-colouring proved popular, people might adjust their code to avoid such anomalies. The problem does not occur if we avoid re-using the same name for items of different kinds at the same lexical level – which IMHO is not a bad rule, anyway.

Implementation Status

I would like to experiment with name-kind colouring for the dfns code samples on the Dyalog website.

This project is under way, but has rather a low priority and so keeps stalling. In addition, I keep wavering between processing the tokens as a list with tail calling and the dictionary as an accumulating left argument, or in more classic APL style as a vector, developing parallel vectors for lexical depth and masks for assignment arrow positions and expression boundaries, etc.

In the meantime, here is an artist’s impression of what name- and bracket- coloured code might look like, with colours: array, function, dyadic-operator.

    eval{
        stk I(op ops)←⍵
        op in⊃⍺:⍺ prt stk I((dref op)cat ops)
        f(fr rr)f rstk
        c{op≡⍵~' '}
        c'dip ':⍺('*⌷'rgs{prt rr I(f cat fr ops)})
        ...

Notice how functions stand out from their arguments. For example, at the start of the third line: op in⊃⍺:, it is clear that in is a function taking ⊃⍺ (first of alpha) as right argument and op as left argument, rather than a monadic function op taking in⊃⍺ (in pick of alpha) as right argument. Other interpretations of the uncoloured expression op in⊃⍺: include:

    op in⊃⍺:    op in is a vector (array), so is pick.
    op in⊃⍺:    both op and in are monadic function calls, so is first.
    op in⊃⍺:    in is a monadic operator with array operand op.
    op in⊃⍺:    in is a dyadic operator with function operands op and .

I’m keen to hear any feedback and suggestions about name-colouring for dfns. What d’ya think?

A Dialog on APL

A discussion between Nicolas Delcros and Roger Hui

Nicolas, Prologue: From a language point of view, thanks to Ken Iverson, it is obvious that you want grade rather than sort as a primitive. Yet from a performance point of view, sort is currently faster than grade.

Can one be “more fundamental” than the other? If so, who’s wrong…APL or our CPUs? In any case, what does “fundamental” mean?

Roger: Sorting is faster than grading due to reasons presented in the J Wiki essay Sorting versus Grading, in the paragraph which begins “Is sorting necessarily faster than grading?” I can not prove it in the mathematical sense, but I believe that to be the case on any CPU when the items to be sorted/graded are machine units.

Nicolas, Formulation 1: Now, parsing ⍵[⍋⍵] (and not scanning the idiom) begs the question of how deeply an APL intepreter can “understand” what it’s doing to arrays.

How would an APL compiler resolve this conjunction in the parse tree? Do you simply have a bunch of state pointers such as “is the grade of” or “is sorted” or “is squozen” or “axis ordering” etc. walking along the tree? If so, do we have an idea of the number of such state pointers required to exhaustively describe what the APL language can do to arrays? If not, is there something more clever out there?

Roger: I don’t know of any general principles that can tell you what things can be faster. I do have two lists, one for J and another for Dyalog. A big part of the lists consists of compositions of functions, composition in the mathematical sense, that can be faster than doing the functions one after the other if you “recognize” what the composed function is doing and write a native implementation of it. Sort vs. grade is one example (sort is indexing composed with grade). Another one is (⍳∘1 >) or (1 ⍳⍨ >). The function is “find the first 1” composed with >. These compositions have native implementations and:

      x←?1e6⍴1e6

      cmpx '5e5(1 ⍳⍨ >)x' '(5e5>x)⍳1' 
5e5(1 ⍳⍨ >)x → 0.00E0  |       0%
(5e5>x)⍳1    → 1.06E¯2 | +272100% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

      cmpx '¯1(1 ⍳⍨ >)x' '(¯1>x)⍳1' 
¯1(1 ⍳⍨ >)x → 2.41E¯3 |   0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
(¯1>x)⍳1    → 4.15E¯3 | +71% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

If you get a “hit” near the beginning, as would be the case with 5e5, you win big. Even if you have to go to the end (as with ¯1), you still save the cost of explicitly generating the Boolean vector and then scanning it to the end.

Another one, introduced in 14.1, is:

      cmpx '(≢∪)x' '≢∪x'
(≢∪)x → 4.43E¯3 |    0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
≢∪x   → 1.14E¯2 | +157% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕

This is the tally nub composition, used often in a customer application. If you “know” that you just want the tally of the nub (uniques), you don’t actually have to materialise the array for the nub.

I am not conversant with compiler technology so I don’t know what all an APL compiler can do. I do know that there’s a thing call “loop fusion” where, for example, in a+b×c÷2, it doesn’t have to go through a,b,c in separate loops, but can instead do

    :for i :in ⍳≢a ⋄ z[i]←a[i]+b[i]×c[i]÷2 ⋄ :endif

saving on the array temp on every step. You win some with this, but I think the function composition approach wins bigger. On the other hand, I don’t know that there is a general technique for function composition. I mean, what general statements can you make about what things can be faster (AKA algebraically simpler)?

Nicholas: I sort of see…so jot is a “direct” conjunction. An indirect conjunction could be ⍵[⍺⍴⍋⍵] where the intermediate grade is reshaped. We “know” that grade and shape are “orthogonal” and can rewrite the formula to ⍴⍵[⍋⍵].

So if we can establish a list of flags, and establish how primitives touch these flags, and how these flags affect each other, then we can extend to any path of the parse tree, provided that the intermediate nodes don’t destroy the relationship between the two operations (here grade + indexing provides sort, independently of the reshape).

Of course we can spend our lives finding such tricks. Or we could try and systemise it.

Roger: What you are saying above is that reshape and indexing commute (i.e. reshape indexing ←→ indexing reshape). More generally, compositions of the so-called structural functions and perhaps of the selection functions are amenable to optimisations. This would especially be the case if arrays use the “strided representation” described in Nick Nickolov’s paper Compiling APL to JavaScript in Vector. I used strided representation implicitly to code a terse model of ⍺⍉⍵ in 1987.

Nicolas, Formulation 2: On which grounds did the guys in the ’50s manage to estimate the minimal list of operations that you needed to express data processing?

Roger: APL developed after Ken Iverson struggled with using conventional mathematical notation to teach various topics in data processing. You can get an idea of the process from the following papers:

In our own humble way, we go through a similar process: We talk to customers to find out what problems they are faced with, what things are still awkward, and think about what if anything we can do to the language or the implementation to make things better. Sometimes we come up with a winner, for example . You know, the idea for (grade) is that often you don’t just use ⍋x to order x (sort) but you use ⍋x to order something else. Similarly with , you often don’t want just x⍳y but you use it to apply a function to items with like indices. The J Wiki essay Key describes how the idea arose in applications, and then connected with something I read about, the Connection Machine, a machine with 64K processors (this was in the 1980s).

Nicolas, Formulation 3: Do we have something with a wider spectrum than “Turing complete or not” to categorise the “usefulness and/or efficiency” of a language?

Roger: Still no general principles, but I can say this:

  • Study the languages of designers whom you respect, and “borrow” their primitives, or at least pay attention to the idea. For example, =x is a grouping primitive in k. Symbols are Arthur Whitney’s most precious resource and for him to use up a symbol is significant.

    =x ←→ {⊂⍵}⌸x   old k definition

    =x ←→ {⍺⍵}⌸x   current k definition

    Both {⊂⍵}⌸x (14.0) and {⍺⍵}⌸x (14.1) are supported by special code.

  • Study the design of machines. For a machine to make something “primitive” is significant. For example, the Connection Machine has an instruction “generalised beta” which is basically {f⌿⍵}⌸. Around the time the key operator was introduced, we (Ken Iverson and I) realised that it’s better not to build reduction into a partitioning, so that if you actually want to have a sum you have to say +⌿⌸ rather than just +⌸. The efficiency can be recovered by recognising {f⌿⍵}⌸ and doing a native implementation for it. (Which Dyalog APL does, in 14.0.)

Roger, Epilogue: To repeat Nicolas’ question in the opening section, what does “fundamental” mean? (Is grade more fundamental than sort?)

In computability, we can use Turing completeness as the metric. But here the requirement is not computability but expressiveness. I don’t know that there is a definitive answer to the question. One way to judge the goodness of a notation, or of an addition to the existing notation such as dfns or key or rank, is to use it on problems from diverse domains and see how well it satisfies the important characteristics of notation set out in Ken Iverson’s Turing lecture:

  • ease of expressing constructs arising in problems
  • suggestivity
  • subordination of detail
  • economy
  • amenability to formal proofs

I would add to these an additional characteristic: beauty.

Do Functions Know Their Own Names?

Going back a long way when John Scholes and I were writing version 0 of Dyalog there was a big discussion about whether functions knew their own names. This discussion still surfaces, with John taking the side that they don’t and me taking the side that they do.

Essentially, John would argue that after A←2, the “2” does not know that it is called “A”. So after (in modern parlance):

      add←{
          ⍺+⍵
      }

the part in {} does not know that it is called “add”.

The real question here can be put in different terms: Is the symbol + representing the addition function itself or is it one of the possible names of the addition function.

From an APL perspective, does this matter? Most of the time it makes no difference. However, when you view your SI stack it does. Consider:

      add←{
          ⍺+⍵
      }
      times←{
          ⍺×⍵
      }
      inner←{
          ⍺ ⍺⍺.⍵⍵ ⍵
      }

Now if we trace into

      1 2 add inner times 3 4

and stop on inner[1] what do we want to see when we type ⍺⍺ in the session. There are two possibilities:

Either you see:

{
    ⍺+⍵
}

or you see:

∇add

Which of these is more useful?

Being more provocative, try putting the functions in a capsule:

[0] foo
[1] 1 2{
[2]     ⍺+⍵
[3] }{
[4]     ⍺ ⍺⍺.⍵⍵ ⍵
[5] }{
[6]     ⍺×⍵
[7] }3 4

and repeatedly trace until [6]:

      )SI
#.foo[6]*
.
#.foo[4]
#.foo[1]

Compare this with the following:

[0] goo
[1] add←{
[2]     ⍺+⍵
[3] }
[4] inner←{
[5]     ⍺ ⍺⍺.⍵⍵ ⍵
[6] }
[7] times←{
[8]     ⍺×⍵
[9] }
[10] 1 2 add inner times 3 4
      )SI
#.times[1]*
.
#.inner[1]
#.goo[10]

In my view, the latter is much more communicative in a debugging environment.

Going back to the version 0 discussion: We didn’t have dfns or dops, so everything was traditional. The discussion was centred around:

∇r←a add b
[1] r←a+b
∇

∇r←a times b
[1] r←a×b
∇

∇ r←a (f inner g) b
[1] r←a f.g b
∇

Now trace this:

      1 2 add inner times 3 4

until at times[1]

The key question at the time was whether )SI should show this:

      )SI
#.times[1]*
.
#.inner[1]

or this:

      )SI
#.g[1]*
.
#.inner[1]

We choose the first of these options as more informative.

So naming things is good and using those names when reporting state information is also good. When the issue was disputed, David Crossley (who was managing the development of Dyalog) resolved it using the argument about the )SI output.

These days it might not be so obvious. In those days we were essentially thinking in terms of a scrolling paper terminal. It pre-dates the full screen experience that even the tty version gives you. We had to wait for Adam Curtis to join the team before we got that. With the context display whilst tracing there is a stronger argument that the eyes using the debugging information do not need the names. Whilst I admit to the weakening I don’t think it actually changes the balance of the case.

We use a lot of C macros in the interpreter. On Linux, gdb gives us access to those macros when we debug the C code – lldb on MAC, dbx on AIX and Visual Studio on Windows all do not have that information and are, therefore, far less helpful.