Index-Of on Multiple Floats Here, the index origin is 0,
although the ideas, algorithms, code, examples,
,
work in 1-origin with obvious modifications.
The problem is to efficiently compute x index-of y on real or complex numbers, where the major cells of x have multiple elements, using tolerant comparison. It’s a problem I’ve worked on for more than 20 years [0] but didn’t have a good solution. Until now. For this problem, the best solution is to set ⎕ct to 0, that is, use exact comparisons. The APL user can make this choice depending on the application. The APL implementer can not. We were motivated to revisit the problem due to a bug
in tolerant unique,
which will be discussed later in this User Meeting
[1].
The current implementation takes time O(n*2) . f creates real matrices x and y with ⍵ rows and ⍺ columns, and runs a benchmark on x⍳y . The first expressions are benchmarks on 3-column matrices, with 1e3 , 2e3 , , 32e3 rows. The behavior is O(n*2) because the time quadruples when the argument size doubles. In contrast, when the arguments are vectors,
the time doubles when the argument size doubles.
The last number shows that on 32e3 rows,
the vector case is about 2600 times faster
than the 3-column case.
If the arguments are real vectors, you can sort them
so that (tolerantly) equal elements are contiguous.
You can then put the elements into clusters so that
equal elements are in the same cluster,
using interval index (⍸) and key (⌸).
Within each cluster, the red indices refer to x
and the blue indices refer to y .
The red if any always
precedes the blue, if any.
If the argument cells have multiple elements,
you can not sort them so that equal cells are contiguous.
Jay Foad offered the key insight that the columns can be clustered independently, resulting in an integer array k . You can then cluster the arguments according to k or, equivalently, according to k⍳k , and in the result equal cells are in the same cluster. The monadic case of key can be used to get the indices
in each cluster instead of the actual data.
Putting it altogether, we get the operator HASH . The most significant point is actually the name HASH , recognizing that what we are doing is hashing, and therefore the clustering calculation is a hash function. Looking at the definition:
On real matrices of size 1e4 by 3 ,
the APL model is 16 times faster than the
current APL primitive.
The most pleasing part
is that the present code is much clearer and shorter than
code from 2010
[2].
If the arguments are complex, the clustering can not be used as is, because it requires sorting and a complex array can not be sorted so that equal numbers are contiguous. But if you sort the magnitudes, then cells with equal magnitudes are contiguous. This works as long as equal complex numbers have equal magnitudes. Using this idea, we get a nice speed-up,
a factor of 40 in this benchmark.
We were pleased about this until the code ran afoul of some QA written in 2010. The repro can be simplified to the following case: a is a number with real part 1 and imaginary part 1+⎕ct . b is a number with real part 1+⎕ct and imaginary part 1+⎕ct×2 . x and y are random vectors of a and b . a and b are equal so x⍳y should be all 0s.
The clustering/hashing algorithm gives wrong answers!
The problem comes down to that a and b are equal but their magnitudes are not, whence clustering by magnitudes leads to an incorrect result. But it should not be possible that equal numbers have unequal magnitudes. A little algebra on the definition of tolerant equality shows that the two comparisons together contradict the triangle inequality, a fact proven by Euclid more than two thousand years ago. Don’t you just hate it when the QA contradicts proven facts?
Calculations show that |a-b is indeed less than |(|a)-|b , which is supposed to be impossible. I did some extended precision arithmetic in J [3], using 120 decimal digits in the calculations and 40 decimal digits in the display. The magnitude of the difference is not less than the difference of the magnitudes, but this is apparent only after 28 decimal places. What to do?
We can not afford to resort to heroic efforts
like doing the calculations with 120 decimal digits.
Fortunately, there is an easy “trick” which works: use double the tolerance to do the clustering. Clustering is a hash function; lumping together unequal cells merely reduces the efficiency, whereas failing to lump together equal cells would give incorrect answers. Is doubling the tolerance enough?
We don’t (yet) have a proof,
but there is empirical evidence of its correctness.
The HASH APL model has been implemented in C for in version 17. The table presents the v16 times divided by v17 times to do x⍳y , for 3-column matrices with 1e3 , 4e3 , 16e3 , and 64e3 rows. The timings depend on the number of clusters. In the (unusual) case of a single cluster,
you pay a penalty for the wasted effort of clustering.
In other cases, the v17 code exhibits linear behavior:
quadrupling the argument size only quadruples the time,
and so the ratios also quadruple versus the O(n*2) v16 code.
That is, the v17 code is “infinitely faster”
than the v16 code.
References
|