Solving the 2014 APL Problem Solving Competition – How Tweet It Is

This post is the continuation of the series where we examine some of the problems selected for the 2014 APL Problem Solving Competition.

The problems presented in Phase 1 of the competition were selected because they could be solved succinctly, generally in a single line of APL code. This makes them well suited for experimentation on TryAPL.org.

Problem 2 of Phase 1, entitled “How tweet it is” reads

“Twitter messages have a 140 character limit – what if the limit was even shorter? One way to shorten the message yet retain most readability is to remove interior vowels from its words. Write a dfn which takes a character vector and removes the interior vowels from each word.”

Test cases:
      {your_solution} 'if you can read this, it worked!'
if yu cn rd ths, it wrkd!
      {your_solution} 'APL is REALLY cool'
APL is RLLY cl
      {your_solution} '' ⍝ an empty vector argument should return an empty vector

      {your_solution} 'a' ⍝ your solution should work with a single character message
a

We’ll examine a couple of approaches to this problem – one that’s more “traditional APL” code, and another that makes use of a really helpful Dyalog feature.

This problem could be restated as “find and remove the vowels that aren’t at the beginning or end of a word”. To start with, we need to determine where the words are and where the vowels are. A word is a contiguous set of letters; multiple words are separated by spaces or punctuation. For simplicity’s sake, we’ll ignore contractions and possessives.

The “Traditional APL” Approach

This approach employs a technique that is not commonly found outside of APL and its brethren – using a Boolean vector to determine which elements to remove or keep. First, let’s find where all the vowels are:

      string←'If you can read this, it worked!'
      vowels←{⍵∊'aeiouAEIOU'}
      vowels string
1 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0

To help illustrate what’s happening, I’ll write a little operator called “show” to compactly display the string, the Boolean vector, and the elements that would be selected by applying the Boolean to the string.

      show←{⍵⍪⍵{↑(1 0⍕ ⍵)(⍵\⍵/⍺)}⍺⍺ ⍵}
      vowels show string
If you can read this, it worked!
10001100100011000010001000100100
I   ou  a   ea    i   i   o  e

Next we want to remove vowels that aren’t at either end of a word. First, find where the words are by finding where the letters are.  There are several ways to do this; the most obvious may be to use a character vector constant:

      letters←{⍵∊'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'}

Long character constants seem a bit awkward to me.  So, another technique uses the Unicode Conversion system function to return the 26 characters starting at the code points for each of ‘a’ and ‘A’:

      letters←{⍵∊⎕UCS (⍳26)∘.+¯1+⎕UCS'aA'}

Yet another way might be to use the code point values directly and do numerical operations:

      letters←{{((⍵≥65)∧⍵≤90)∨(⍵≥97)∧⍵≤122}⎕UCS ⍵}

Which technique you choose is largely a matter of taste and style. All three return the same result and have comparable performance. My personal preference is the second one – it has fewer characters for me to mistype 🙂

      letters show string
If you can read this, it worked!
11011101110111101111001101111110
If you can read this  it worked 

So now let’s mark the interior letters of the words. This employs a technique known as shift and compare that I learned in the early 1980s when I was privileged to work with Bob Smith. Among Bob’s many contributions to the APL world was a book on Boolean Functions and Techniques. To mark the interior letters, we’ll do both a right and left shift:

      interior←{⍵∧(¯1↓0,⍵)∧1↓⍵,0}
      {interior letters ⍵} show string
If you can read this, it worked!
00001000100011000110000000111100
    o   a   ea   hi       orke  

The last step is to find interior vowels and negate:

      {(vowels ⍵)∧interior letters ⍵} show string
If you can read this, it worked!
00001000100011000010000000100100
    o   a   ea    i       o  e  

      {(vowels ⍵)⍲interior letters ⍵} show string
If you can read this, it worked!
11110111011100111101111111011011
If y u c n r  d th s, it w rk d!

Putting it all together…

      tweet←{⍵/⍨~(⍵∊'aeiouAEIOU')∧{(1↓⍵,0)∧¯1↓0,⍵}⍵∊⎕UCS(⍳26)∘.+¯1+⎕UCS'aA'}
      tweet string
If yu cn rd ths, it wrkd!

The Other Technique – Using Regular Expressions

In version 13.0, Dyalog introduced the system functions ⎕S and ⎕R as interfaces to the PCRE (Perl Compatible Regular Expression) library. Like APL, regular expressions may seem a bit alien at first, but in the years since their incorporation into Dyalog, I’ve grown to appreciate their power and flexibility – they can frequently accomplish complex string manipulations more succinctly than their APL equivalents thus furthering Dyalog’s power as a tool of thought, notation and execution.

      tweet←{('\B[AEIOU]\B' ⎕R '' ⍠ 1) ⍵}
      tweet string
If yu cn rd ths, it wrkd!

The expression above replaces any vowel (⍠ 1means case-insensitive) that is not at the beginning or end of a word with the empty vector, effectively removing the interior vowels. A blog post is not enough space to give an adequate overview of regular expressions. But I hope the expression above piques your interest and encourages you to experiment with ⎕S and ⎕R on TryAPL.org or with a Dyalog system of your own.

Comments are closed.