Exploring Index-Of
Roger Hui

 


The performance of index-of and associates often plays an outsized role in the overall performance of an application. In this workshop we examine the detailed workings of index-of. Topics covered include:

  • extension in 14.0
  • ∊ ∪ ∩ ~
  • x⍳x
  • small-range algorithm
  • 8⌶ , inverted table index-of
  • f⌸
  • (≢∪)x

After the workshop you should be able to answer the following questions:

  • Which cases of x⍳y are fast?
  • Which cases of x⍳y can be faster?

Please bring a computer with Dyalog v14.0 installed in order to do the exercises in the workshop.
 

Exercises

0.  Write a 1-line model of x⍳y (ISO Standard APL version).

1.  Write a 1-line model of x⍳y as extended in Dyalog v14.0.

2.  Write 1-line models of the extended ∊ ~ ∩ ∪ in terms of the extended .

3.  Consider the following model of x⍳y using hashing. (Also at http://www.jsoftware.com/papers/APLHashingModel.htm .) Find integer vectors x and y such that x xiy y takes less than one-tenth the time as that for the model from exercise 0.

 z←x xiy y;⎕io;h;hf;i;j;m;n;q
⍝ model of x⍳y using hashing; written for easy translation into C

 ⎕io←0
 hf←{123457×⍵}                            ⍝ hash function
 n←≢y
 m←≢x
 q←2*1+⌈2⍟m                               ⍝ size of hash table
 h←q⍴m                                    ⍝ hash table; m means "free"
 z←n⍴m                                    ⍝ initialize to "not found"

 :For i :In ⍳m                            ⍝ index for each x
     j←q|hf x[i]                          ⍝ index in hash table
     :While m>h[j] ⋄ :AndIf x[h[j]]≠x[i]  ⍝ stop on finding m or =
         j←q|1+j                          ⍝ the next index
     :End
     :If m=h[j] ⋄ h[j]←i ⋄ :End           ⍝ new hash entry
 :End

 :For i :In ⍳n                            ⍝ index for each y
     j←q|hf y[i]                          ⍝ index in hash table
     :While m>h[j] ⋄ :AndIf x[h[j]]≠y[i]  ⍝ stop on finding m or =
         j←q|1+j                          ⍝ the next index
     :End
     z[i]←h[j]                            ⍝ m=h[j] or x[h[j]]=y[i]
 :End

4.  Write xiyC , an adaptation of xiy , to return the number of comparisons involving x or y .

5.  Modify function xiyC to use a different hash table size. What happens to the number of comparisons?

6.  Modify function xiyC to use a different hash function hf . What happens to the number of comparisons?

7.  Write xix , an adaptation of xiy , to exploit the case where the left and right arguments are the same. Run some benchmarks comparing xix x and x xiy x .

8.  Run some benchmarks on x⍳y for general 4-byte integers and for small-range 4-byte integers. For example:

x y ← ↓u[?2 1e6⍴≢u←    ?8e5⍴2e9]  ⍝ general
s t ← ↓u[?2 1e6⍴≢u←2e9+?8e5⍴1e6]  ⍝ small-range
cmpx 'x⍳y' 's⍳t'

9.  Consider the following model for computing x⍳y for small-range x and y . (Also at http://www.jsoftware.com/papers/APLHashingModel.htm .) Write xiySR , a non-looping version of it.

 z←x xiySRloop y;⎕io;i;max;min;n;t
⍝ small-range version of x⍳y; written for easy translation into C
 ⎕io←0
 n←≢x
 min←⌊/x,y
 max←⌈/x,y
 t←(1+max-min)⍴n                              ⍝ table of indices
 z←(≢y)⍴0                                     ⍝ eventual result
 :For i :In ⌽⍳n ⋄ t[x[i]-min]←i    ⋄ :EndFor  ⍝ populate table
 :For i :In ⍳≢y ⋄ z[i]←t[y[i]-min] ⋄ :EndFor  ⍝ read index for each y

10.  What are some special circumstances worthwhile exploiting in algorithm in the previous exercise?

11.  Write xixSR , an adaptation of xiySR , to exploit the case where the left and right arguments are the same. Run some benchmarks comparing xixSR x and x xiySR y where y←0+x .

12.  The small-range algorithm for x⍳y involves initializing a table. Show how to avoid initializing this table (when translated into C) while retaining the O(1) access time.

13.  (∪x)⍳x , index in nub, is an interesting computation (for example, see Dyalog Language Forum message operator each-nub, 2014-05-29). Suppose x is a vector of 4-byte integers, let’s say x←?1e6⍴2e9 . Compute the index in nub faster than (∪x)⍳x .

14.  Run some benchmarks onon 4-byte integers, 4-column matrices of letters of the alphabet, and 32-column boolean matrices.

15.  Let x and y be vectors of floating point numbers. Compare the times for x⍳y and x{⎕ct←0 ⋄ ⍺⍳⍵}y . Compare their results.

16.  Write a 1-line model for inverted-table index-of.

17.  Run some benchmarks comparing x⍳y and (,⊂x)(8⌶)(,⊂y) .

18.  Write a model of f⌸ .

19.  Adapt xix to compute (≢∪)x . Do the same for xixSR .

20.  Construct a table of the times for ⌈/x , +/x , {⍵[⍋⍵]}x , ⍋x , x⍳y , x⍳x , for x and y in various datatypes.


References

•   Aho, A.V., Hopcroft, J.E., and Ullman, J.D., The Design and Analysis of Computer Algorithms, Addison-Wesley, 1974; Exercise 2.12.
•   Hui, Roger, Index in Nub, J Wiki Essay, 2007-05-25.
•   Hui, Roger, Hashing for Tolerant Index-Of, 2010-09-16.
•   Hui, Roger, Index-Of, A 30-Year Quest, 2014-07-25.
•   Hui, Roger, APL and J Hashing Models, 2014-07-31.
•   Kankowski, P., Benchmarking CRC32 and PopCnt Instructions, 2010.
•   Knuth, D.E., The Art of Computer Programming, Volume 3, Sorting and Searching, Addison-Wesley, 1973.
•   Microsoft, SSE 4 Instructions, Visual Studios 2008, Micrsoft Developer Network, 2008; Compiler Instrinsics _mm_crc32_u8, _mm_crc32_u16, _mm_crc32_u32, and _mm_crc32_u64.
•   Wikipedia, Computation of CRC, 2014.



Presented at Dyalog ’14, 2014-09-21.

created:  2014-05-30 13:45
updated:2014-09-16 06:10