Name Colouring for Dfns

APL is sometimes criticised because expressions that include names cannot, in general, be parsed without knowing whether the names represent functions or variables. For example, the name thing in the expression thing⍳3 could reference an array (in which case the is dyadic) or it could reference a function (making the monadic).

An APL expression becomes completely parsable if we distinguish each name with one of four colours, depending on the “kind” of its referent: array, function, monadic operator, dyadic operator. Now, with a bit of practice, we can at least parse thing⍳3 vs thing⍳3 without recourse to the definition of thing. Notice how kind differs slightly from APL’s name-class, which does not distinguish monadic from dyadic operators.

Names whose definitions are external to the code under consideration and so cannot be classified would be given a distinct colour, say red, which would at least draw attention to them. Colouring a number of interdependent functions at the same time should help with such issues.

Name-colouring can co-exist with the more superficial token-colouring ( green for comments and so forth) though we would probably want to configure the two schemes separately.

There’s a related argument for colouring parentheses to show the kind of value they contain: (~∘(⊂'')). This would mean that we could always determine the valency of a function call, or the kind of a hybrid token, such as /, by looking at the colour of the single token immediately to its left. Finally, we should probably kind-colour curly braces: {}, {⍺⍺ ⍵}, {⍵⍵ ⍵}.

Yeah but how?

In the following, where appropriate, “function” is a shorthand for “function or operator”.

Most generally, we want to process a number of definitions at the same time, so that inter-function references can be coloured. Such a set of functions may be considered as internal definitions in an anonymous outer function so, without loss of generality, we need consider only the case of a single multi-line nested function.

A function can be viewed as a tree, with nested subfunctions as subtrees. The definitions at each lexical level in the function comprise the information stored at each node of the tree.

The colouring process traverses the tree, accumulating a dictionary of (name kind) pairs at each level. At a particular level, definitions are processed in the same order as they would be executed, so that the (name kind) entries from earlier definitions are available to inform later expressions. Visiting each node before traversing its subtrees ensures that outer dictionary entries are available to lexically inner functions. Prefixing dictionary entries for inner functions ensures that a left-to-right search (à la dyadic iota) finds those names first, thus modelling lexical name-shadowing.

Each assignment expression adds one or more names to the dictionary. The kind of the assignment is inferred by looking at the defining expression to the right of the assignment arrow. This sounds heavy, but if we examine the expression from right-to-left then we can often stop after examining very few tokens. For example, an arbitrarily long expression ending in (… function array) must, if it is syntactically correct, reduce to an array value, while (… function function) must be a function (train). A sequence in braces {…} resolves to a function, monadic operator or dyadic operator, depending only on whether there are occurrences of ⍺⍺ or ⍵⍵ at its outermost level.

This approach, while relatively simple, is not perfect. It takes no account of the order in which names are defined relative to their being referenced. The assumption is that all definitions at an outer level are available to its inner levels. The following small function illustrates the problem:

    {
        M{A}      ⍝ this A correctly coloured function
        N{A}⍵     ⍝ this A should be coloured unclassified
        A←÷          ⍝ A defined as a function
        M N          ⍝ application of function to array
    }

It remains to be seen whether this would be a problem in practice or whether, if name-colouring proved popular, people might adjust their code to avoid such anomalies. The problem does not occur if we avoid re-using the same name for items of different kinds at the same lexical level – which IMHO is not a bad rule, anyway.

Implementation Status

I would like to experiment with name-kind colouring for the dfns code samples on the Dyalog website.

This project is under way, but has rather a low priority and so keeps stalling. In addition, I keep wavering between processing the tokens as a list with tail calling and the dictionary as an accumulating left argument, or in more classic APL style as a vector, developing parallel vectors for lexical depth and masks for assignment arrow positions and expression boundaries, etc.

In the meantime, here is an artist’s impression of what name- and bracket- coloured code might look like, with colours: array, function, dyadic-operator.

    eval{
        stk I(op ops)←⍵
        op in⊃⍺:⍺ prt stk I((dref op)cat ops)
        f(fr rr)f rstk
        c{op≡⍵~' '}
        c'dip ':⍺('*⌷'rgs{prt rr I(f cat fr ops)})
        ...

Notice how functions stand out from their arguments. For example, at the start of the third line: op in⊃⍺:, it is clear that in is a function taking ⊃⍺ (first of alpha) as right argument and op as left argument, rather than a monadic function op taking in⊃⍺ (in pick of alpha) as right argument. Other interpretations of the uncoloured expression op in⊃⍺: include:

    op in⊃⍺:    op in is a vector (array), so is pick.
    op in⊃⍺:    both op and in are monadic function calls, so is first.
    op in⊃⍺:    in is a monadic operator with array operand op.
    op in⊃⍺:    in is a dyadic operator with function operands op and .

I’m keen to hear any feedback and suggestions about name-colouring for dfns. What d’ya think?