Maintaining Py’n’APL Part 2: APL Arrays, Python Objects, and JSON

As part of the bigger, overarching refactoring goal of making Py’n’APL great again, I refactored some of the code that deals with sending data from Python to APL and receiving data from APL into Python. In this blog post, I will describe – to the best of my abilities – how that part of the code works, how it differs from what was in place, and why those changes were made.

The starting point for this blog post is the commit b7d4749.

This blog post is mostly concerned with the files Array.py, ConversionInterface.py, and ObjectWrapper.py (these were the original file names before I tore them apart and moved things around). It does not make much sense to list where all the things went, but you can use GitHub’s compare feature to compare the starting commit for this blog post with the “final” commit for this blog post.

State of Affairs

If you are going to refactor a working piece of code, the first thing you need to do is to make sure that you know what the code is doing! This will help to ensure that your refactoring does not break the functionality of the code. With that in mind, I started working my way through the code.

I started by looking at the file ConversionInterface.py and the two classes Sendable and Receivable that were defined in there. By reading the comments, I understood that these two classes were defining the “conversion interface”. In this context, the word “interface” has approximately the Java meaning of interface: it defines a set of methods that the classes that inherit from these base classes have to implement. For the class Sendable, there are two methods toJSONDict and toJSONString; and for the class Receivable, there is one method to_python.

Even though I had just started, I already had a couple of questions:

  1. Do the names Sendable and Receivable mean that these objects will be sent to/received from APL or from Python respectively?
  2. Why is there a comment next to the definition of Sendable that says that classes that implement a method from_python will inherit from Sendable? Is that a comment that became a lie as the code evolved? If not, why isn’t there a stub for that method in the class itself?

The more I pondered on these questions, the more I started to think that the “conversion interface” isn’t necessarily about the sending to/receiving from APL, but rather the conversion of built-in Python types to helper classes like APLArray or APLNamespace (from the file Array.py) and back. So, it might be that Sendable and Receivable are supposed to be base classes for these helper classes, telling us which ones can be converted to/from built-in Python types. I needed to solve this conundrum before I could prepare these two base classes and use Python mechanisms to enforce these “interfaces”.

What the Interface Really Means

After playing around with the code a bit more, I felt more confident that Sendable should be inherited by classes that represent things that can be sent to APL and Receivable represents things that can be received from APL. However, it must be noted that Py’n’APL doesn’t send Python built-in types directly to APL. Whenever we want to send something to APL, Py’n’APL first converts it to the suitable intermediate (Python) class. For example, lists and tuples are converted to APLArray, and dictionaries are converted to APLNamespace.

If an APLArray instance is supposed to be sendable to APL, we must first be able to build it from the corresponding Python built-in types, and that is why almost all Sendable subclasses also implement a method from_python. Looking at it from the other end of the connection, Receivable instances come from APL and Py’n’APL starts by taking the JSON and converting it into the appropriate APLArray instances, APLNamespace instances, etc. Only then can we convert those intermediate representations to Python, and that is why all Receivable subclasses come with a method to_python. In addition, those Receivable instances come from APL as JSON, so we need to be able to instantiate them from JSON. That is why Receivable subclasses also implement a method fromJSONString, although that is not defined in the Receivable interface.

So, we have established that APL needs to know how to make sense of Python’s objects and Python needs to know how to make sense of APL’s arrays. (In Python, everything is an object, and in APL, everything is an array. In less precise – but maybe clearer – words, Python needs to be able to handle whatever APL passes to it, and APL needs to be able to handle whatever Python passes to it.) To implement this, we need to determine how Python objects map to APL arrays and how APL arrays map to Python objects. This is not trivial, otherwise I wouldn’t be writing about it! Here are two simple examples showing why this is not trivial:

  • Python does not have native support for arrays of arbitrary rank.
  • APL does not have a key-value mapping type like Python’s dict.

To solve the issues around Python and APL not having exactly the same type of data, we create lossless intermediate representations in both host languages. For example, Python needs to have an intermediate representation for APL arrays so that we can preserve rank information in Python. When possible, intermediate representations should know how to convert into the closest value in the host language. For example, the Python intermediate representation of a high-rank APL array should know how to convert itself into a Python list.

I began by looking at the handling of APL arrays and namespaces. These are the conversions that need to be in place:

  • APL arrays ←→ Python lists
  • APL arrays ← arbitrary Python iterables
  • APL namespaces ←→ Python dictionaries

When sending data from the Python side, it first needs to be converted into an instance of the appropriate APLProxy subclass. For example, a dictionary will be converted into an instance of APLNamespace. That object is converted to JSON, which is then sent to APL. APL receives the JSON and looks for a special field __extended_json_type__, which identifies the type of object. In this example, that is "APLNamespace". APL then uses that information to decode the JSON data into the appropriate thing (a namespace in this example).

When sending data from the APL side, a similar thing happens. First, the object is converted into a namespace that ⎕JSON knows how to handle. For example, an array becomes a namespace with attributes shape (the shape of the original array) and data (the ravel of the original array); the namespace is tagged with an attribute __extended_json_type__, which is a simple character vector informing Python what the object is. That namespace gets converted to JSON with ⎕JSON, and the JSON is sent to Python. Python receives the JSON and decodes it into a Python dictionary. Python then uses __extended_json_type__ to determine the actual object that the dictionary represents (an array, in our example) and uses the information available to build an instance of the appropriate APLProxy subclass (APLArray in this example).

Github commit 40523b9 shows one initial implementation of the APL code that takes APL arrays and namespaces and converts them into namespaces that ⎕JSON can handle and that Python knows how to interpret. This commit also shows the APL code for the reverse operation. For now, this APL code lives in the file Proxies.apln and the respective Python code lives in the file proxies.py. Everything is ready for me to hook this into the Py’n’APL machinery so that Py’n’APL uses this mechanism to pass data around…but that’s for another blog post!

Summary of Changes

GitHub’s compare feature shows all the changes I made since the commit that was the starting point for this post. The most notable changes are:

  • Moving the contents of ConversionInterface.py and ObjectWrapper.py into Array.py.
  • Adding the file proxies.py that will have the Python code to deal with the JSON and conversions, which will end up replacing most of the code I mentioned in the previous bullet point.
  • Adding the file Proxies.apln that will have the APL code to deal with the JSON and conversions, which will end up replacing a chunk of code that currently lives in Py.dyalog, which is a huge file with almost all of the Py’n’APL APL code.

Blog posts in this series:

Maintaining Py’n’APL Part 1: The Beginning

Py’n’APL is an interface between APL and Python that allows you to run Python code from within APL and APL code from within Python. This interface was originally developed by Dyalog Ltd intern Marinus Oosters, who presented it in a webinar and at Dyalog ’17. I subsequently talked about Py’n’APL at Dyalog ’21, where I promised to update it and make it into an awesome and robust tool.

I’ve now stared at Py’n’APL’s code base for longer than I’m proud to admit, but without any proper goals and some basic project management this has been as effective in cleaning it up as a Magikarp’s Splash – in other words, it has had no effect.

For that reason, and in another attempt to take up the maintenance of Py’n’APL, I have decided to start blogging about my progress. This will be a way for me to share with the world what it feels like to take up the maintenance of a project that you aren’t necessarily very familiar with.

(By the way, Py’n’APL is open source and has a very permissive licence. This means that, like me, you can also stare at the source code; it also means that you can go to GitHub, star the project, fork it, and play around with it!)

Tasks

There are some obvious tasks that I need to do, like testing Py’n’APL thoroughly. This will help make Py’n’APL more robust, it will certainly uncover bugs, and it will help me to document Py’n’APL capabilities. The Python side will be tested with pytest and the APL side will be tested with CITA, which is a Continuous Integration Tool for APL.

The code base also needs to be updated. Py’n’APL currently supports Python 2 up to Python 3.5. At the time of writing this blog post, Python 2 has been in end-of-life for more than 2 years and Python 3.7 is reaching end of life in a couple of months. In other words, there is no overlap between the original Python versions supported and the Python versions that an application should currently support. In addition, Dyalog has progressed from v16.0 to v18.2, and the new tools available with the later versions are also likely to be useful.

Another big thing that should be done (and that would pay high dividends) is to update the project management of the Python part of Py’n’APL. By using the appropriate tooling, we make it easier to clone the (open source) repository so that others can poke around, play with it, modify it, and/or contribute.

The First Commits

Let GitHub commit 4283176f4ffd7f1067f216c1459306cdbc49189a be the starting point of my documented journey. At this point in time, I have two handfuls of commits on the branch master that fixed a (simple) issue with a Python import and added the usage examples I showed at Dyalog ’21. So, what will my first commits look like?

Setting up Poetry

The first thing I decided to do was to set up Poetry to manage the packaging and dependencies of the Python-side of code. By using Poetry, isolating whatever I do to/with the Python code from all the other (Python) things I have on my computer becomes trivial and it makes it very easy to install the package pynapl on my machine.

Auto-Formatting the Source Code

Another thing that I did was to use black (which I added as a development dependency to Poetry) to auto-format all the Python code in the repository. I imagine that this might sound surprising if you come from a different world! But if you look at the commit in question, you will see that although that commit was a big one, the changes were only at the level of the structure of the source code; by using a tool like black, I can play with a code base that is consistently formatted and – most importantly – that is formatted like every other Python project I have taken a look at. This consistency in the Python world makes it easier to read code, because the structure of the code on the page is always the same. This means that there is one less thing for my brain to worry about, which my brain appreciates!

In a typical Python project using black, or any other formatter, the idea is that the formatter is used frequently so that the code always has that consistent formatting style; the idea is not to occasionally insert an artificial commit that is just auto-formatting.

Fixing (Star) Imports

The other major minor change that I made was fixing (star) imports across the Python source code. Star imports look like from module_name import * and are like )LOADing a whole workspace in APL – you will gain access to whatever is inside the workspace you loaded. In Python, star imports are typically discouraged because after a star import you have no idea what names you have available, nor do you know what comes from where, which can be confusing if you star imported multiple modules. Instead, if you need the tools foo and bar from the module module_name, you should import the module and use the tools as module_name.foo and module_name.bar, or import the specific names that you need: from module_name import foo, bar.

I therefore went through the Py’n’APL Python source code and eliminated all the star imports, replacing them by the specific imports that were needed. (OK, not quite all star imports; the tests still need to be reworked.) As well as fixing star imports, I also reordered the imports for consistency and removed imports that were no longer needed.

Python 2-Related Low-Hanging Fruit

To get started with my task of removing old Python 2 code, I decided to start with some basic trimming. For example, there were plenty of instances where the code included conditional assignments that depended on the major version of Python (2 or 3) that were supposed to homogenise the code, making it look as much as possible like Python 3. I could remove those because I know we will be running Python 3. Another fairly basic and inconsequential change I could make was removing the explicit inheriting from object when creating classes (this was needed in Python 2, but not in Python 3).

Explicit Type Checking and Duck Typing

Python is a dynamically-typed language, and sometimes you might need to make use of duck typing to ensure that you are working with the right kind of objects. At Dyalog Ltd we are very fond of ducks, but duck typing is something else entirely:

If it walks like a duck and if it quacks like a duck then it must be a duck.

In other words, in Python we tend to care more about what an object can do (its methods) than what the object is (its type). The Py’n’APL source code included many occurrences of the built-in type and I went through them, replacing them with isinstance to implement better duck typing.

What Happens Next?

These are some of the main changes that I have made so far; they happen to be mostly inconsequential and all on the Python side of the code. Of course, I won’t be able to maintain Py’n’APL by only making inconsequential changes, so more substantial changes will come next. I also need to take a look at the APL code and see what can and what needs to be done there. Although I haven’t looked at the APL code as much as at the Python code, I have a feeling that I will not need to make as many changes there. Fingers crossed!

This blog post covers (approximately) the changes included in this GitHub diff.