Name Colouring for Dfns

APL is sometimes criticised because expressions that include names cannot, in general, be parsed without knowing whether the names represent functions or variables. For example, the name thing in the expression thing⍳3 could reference an array (in which case the is dyadic) or it could reference a function (making the monadic).

An APL expression becomes completely parsable if we distinguish each name with one of four colours, depending on the “kind” of its referent: array, function, monadic operator, dyadic operator. Now, with a bit of practice, we can at least parse thing⍳3 vs thing⍳3 without recourse to the definition of thing. Notice how kind differs slightly from APL’s name-class, which does not distinguish monadic from dyadic operators.

Names whose definitions are external to the code under consideration and so cannot be classified would be given a distinct colour, say red, which would at least draw attention to them. Colouring a number of interdependent functions at the same time should help with such issues.

Name-colouring can co-exist with the more superficial token-colouring ( green for comments and so forth) though we would probably want to configure the two schemes separately.

There’s a related argument for colouring parentheses to show the kind of value they contain: (~∘(⊂'')). This would mean that we could always determine the valency of a function call, or the kind of a hybrid token, such as /, by looking at the colour of the single token immediately to its left. Finally, we should probably kind-colour curly braces: {}, {⍺⍺ ⍵}, {⍵⍵ ⍵}.

Yeah but how?

In the following, where appropriate, “function” is a shorthand for “function or operator”.

Most generally, we want to process a number of definitions at the same time, so that inter-function references can be coloured. Such a set of functions may be considered as internal definitions in an anonymous outer function so, without loss of generality, we need consider only the case of a single multi-line nested function.

A function can be viewed as a tree, with nested subfunctions as subtrees. The definitions at each lexical level in the function comprise the information stored at each node of the tree.

The colouring process traverses the tree, accumulating a dictionary of (name kind) pairs at each level. At a particular level, definitions are processed in the same order as they would be executed, so that the (name kind) entries from earlier definitions are available to inform later expressions. Visiting each node before traversing its subtrees ensures that outer dictionary entries are available to lexically inner functions. Prefixing dictionary entries for inner functions ensures that a left-to-right search (à la dyadic iota) finds those names first, thus modelling lexical name-shadowing.

Each assignment expression adds one or more names to the dictionary. The kind of the assignment is inferred by looking at the defining expression to the right of the assignment arrow. This sounds heavy, but if we examine the expression from right-to-left then we can often stop after examining very few tokens. For example, an arbitrarily long expression ending in (… function array) must, if it is syntactically correct, reduce to an array value, while (… function function) must be a function (train). A sequence in braces {…} resolves to a function, monadic operator or dyadic operator, depending only on whether there are occurrences of ⍺⍺ or ⍵⍵ at its outermost level.

This approach, while relatively simple, is not perfect. It takes no account of the order in which names are defined relative to their being referenced. The assumption is that all definitions at an outer level are available to its inner levels. The following small function illustrates the problem:

    {
        M{A}      ⍝ this A correctly coloured function
        N{A}⍵     ⍝ this A should be coloured unclassified
        A←÷          ⍝ A defined as a function
        M N          ⍝ application of function to array
    }

It remains to be seen whether this would be a problem in practice or whether, if name-colouring proved popular, people might adjust their code to avoid such anomalies. The problem does not occur if we avoid re-using the same name for items of different kinds at the same lexical level – which IMHO is not a bad rule, anyway.

Implementation Status

I would like to experiment with name-kind colouring for the dfns code samples on the Dyalog website.

This project is under way, but has rather a low priority and so keeps stalling. In addition, I keep wavering between processing the tokens as a list with tail calling and the dictionary as an accumulating left argument, or in more classic APL style as a vector, developing parallel vectors for lexical depth and masks for assignment arrow positions and expression boundaries, etc.

In the meantime, here is an artist’s impression of what name- and bracket- coloured code might look like, with colours: array, function, dyadic-operator.

    eval{
        stk I(op ops)←⍵
        op in⊃⍺:⍺ prt stk I((dref op)cat ops)
        f(fr rr)f rstk
        c{op≡⍵~' '}
        c'dip ':⍺('*⌷'rgs{prt rr I(f cat fr ops)})
        ...

Notice how functions stand out from their arguments. For example, at the start of the third line: op in⊃⍺:, it is clear that in is a function taking ⊃⍺ (first of alpha) as right argument and op as left argument, rather than a monadic function op taking in⊃⍺ (in pick of alpha) as right argument. Other interpretations of the uncoloured expression op in⊃⍺: include:

    op in⊃⍺:    op in is a vector (array), so is pick.
    op in⊃⍺:    both op and in are monadic function calls, so is first.
    op in⊃⍺:    in is a monadic operator with array operand op.
    op in⊃⍺:    in is a dyadic operator with function operands op and .

I’m keen to hear any feedback and suggestions about name-colouring for dfns. What d’ya think?

12 thoughts on “Name Colouring for Dfns

  1. I think this sort of syntax highlighting would be quite cool. Let me throw in some alternative ideas:

    * Rainbow parens: https://i.stack.imgur.com/Hlctr.png I think this would be good for nested dfns, particularly if ⍺⍵∇: match the colour of their enclosing {}

    * Highlight each name in a unique colour, so it’s easy to spot its usages. https://medium.com/@evnbr/coding-in-color-3a6db2743a1e

    * Highlight scopes. For instance in “a←1⋄{a⊢←2}” the bit in braces except for the “a” can be of a different colour than rest. Having both “a”-s in the same colour indicates that none of them shadows the other. https://www.youtube.com/watch?v=dkZFtimgAcM#t=15m58s

    Disclaimer: I am not an APL programmer and I can’t judge how useful each of these would be in practice.

  2. > depending on the “kind” of its referent: array, function, monadic operator, dyadic operator

    It _looked_ like the words for the different kinds were in different colors, but I wasn’t sure (other than that “function” was in blue), so I had to view the page source to find out that “array” is in green, “monadic operator” is in navy blue, and “dyadic operator” is in purple. (I am not color blind that I know of.) Increasing the font size also made the colors distinguishable, but it had to be 150% or more. I don’t think I’d like to look at code blown up to that size. I realize that the actual colors will probably be configurable, but for myself all the different colors will be too distracting.

    An alternative is to display the “kind” (and other) information when you hover over the name with the mouse, the facility provided by the mark-up.

  3. Ah, I see the blog post page supports the <acronym> mark-up, and that messed up the last phrase in my previous comment.

    x
    foo

    Hover over each of the above to get its “kind”.

  4. What prompted this post was that I couldn’t shake reading op in⊃⍺ as “op applied to the in pick of ” although I knew intellectually that in was a function and so ⊃⍺ was “first of alpha”.

    I wanted a mechanism that would send my mental parser in the right direction without having to think about it, in fact without being aware it was happening. Name-colouring seemed to offer some hope.

    I accept Roger’s comments that the colours chosen in the post are less than ideal. I think Morten had some good suggestions for improvement.

    • In the specific case of op in⊃⍺, would it help to put a blank in front of the when in is a function? That is, op in ⊃⍺.

      • In this specific case yes, the blank would help greatly. But if the token to the right of in were a named function, say, it wouldn’t help. I can’t off-hand think how spacing might help with other parsing ambiguities, such as fn/¨... vs vec/¨....

    • Glad you liked the ideas – I will try to post them here soon but I need colouring and other features which I don’t know how to put in comments; I may need to respond with another blog entry :-).

  5. Raul’s suggestion is interesting. If I understand him it means having the interpreter record the actual kind associated with each name, as it is dereferenced at runtime, and colouring the code accordingly. This would involve fairly major surgery to the interpreter, so we’d need to feel that there was a strong appetite for name-colouring or something similar before we embarked on it.

  6. Re: Morten’s follow-up post. I prefer Morten’s suggestion of using emphasis (function, monadic operator, and dyadic operator) instead of using color. It would also be good to be able to turn it on and off conveniently (preferably one click).

    • I too like Morten’s suggestion of using italic and bold fonts to identify name-kinds. We could still refer to the concept as “colouring” in the same abstract sense as “register colouring” unless someone has a better term. I agree with Roger that we would need a nifty way to turn it on and off.

      • Ah, like “color” in quantum chromodynamics? Perhaps the single word “color” can denote both “color” and “kind” in your usage?

  7. We requested more freedom wrt syntax highlighting some time ago. Reason being that we would like to use the output of our static code analyser wrt kind and ‘vartype’, and more importantly be able to provide more intellisense based upon the attributes of the references – of course in realtime. We get basically all our knowledge about kind and type from our parser.

    It will be interesting to see what you end up with – I hope we get the option of customizing basically everything!