Inverted Tables Here, the index origin is 0, although the ideas, algorithms, code, examples, , work in 1-origin with obvious modifications. As well, we assume ⎕ct←0 .
If ⎕ct≠0 , that is, if tolerant comparison is used,
computing the analogs of ⍳ and related functions
on tables is a difficult problem.
It can be solved using techniques described in
Index-Of
on Multiple Floats
in Dyalog ’17 but is beyond the scope of this note.
A table is 2-dimensional data in which the items of a column have the same data type; an inverted table is a table in which the columns are nested simple arrays. An inverted table is more agreeably displayed by application of ⍪¨ . X is an example of a table. It has 8 rows and 5 columns. T is the equivalent inverted table. The function invert applies to a table to produce the equivalent inverted table. (We sometimes call a table a “verted table”,
to clearly distinguish it from an inverted table.)
A verted table can be created by repeated comma-bar assign. An inverted table can be created by repeated comma-bar each assign. Note that for an inverted table, items which are vectors must be fixed-length vectors. Two puzzles:
The first puzzle is for Array Notation gurus.
How would you express an inverted table in Array Notation?
The second puzzle is for APL2 array prototype phanatics:
What is the prototype for an inverted table?
For that matter, what is the prototype
for a verted table?
An inverted table is more efficient in space
than a verted table, in this case, by a factor of about 10.
An inverted table is usually more efficient in time than a verted table.
The first and second benchmarks are a queries.
The third benchmark finds the index of one table in another;
the function 8⌶ implements index-of
on inverted tables.
The fourth benchmark indexes into a table,
and in such indexing a verted table is faster
than an inverted table, but in both cases the
computation takes a tiny amount of time.
Alas, the news isn’t all good. Appending a new row to a verted table
is much more efficient than doing the same to an inverted table.
The benchmark on X1 v T1 shows that ⍪←
is supported by special code but ⍪¨← is not.
An RFE has been filed on this.
Internal Representation: Verted The internal representations explain why there are such vast differences in space and time. In APL, each item of a nested array is a pointer to an array, and every array has a header and data. On a 64-bit system, a pointer takes 64 bits. The size in bytes of a simple array is: size←{24+(8×≢⍴⍵) + 8×⌈64÷⍨(×/⍴⍵)×⌊0.1×⎕dr ⍵} The components of size are, from right to left:
The space for all those headers adds up. This same diagram also explains why verted tables are worse in time. The data for the verted table items are scattered in memory, and access to an item requires going through a pointer. This “mechanical sympathy” with APL implementations suggests that it might be beneficial to have an i-beam (a system facility) to compact the array items of a nested array. (This is different from workspace compaction, which coalesces free blocks of memory but does not reorder existing arrays.) The following benchmarks support this argument: foo n;_ p←(n,8)⍴⊂⍬ _←{0⊣p[;⍵]←⍕¨?n⍴2e9}¨⍳8 q←⍕¨?(n,8)⍴2e9 r←(n,8)⍴⊂⍬ r[n?n;]←⍕¨?(n,8)⍴2e9 foo 1e6 creates verted tables p , q , and r with the same size, 1e6 rows and 8 columns.
⎕wa ⍝ force workspace compaction 2069598432 foo ⎕wa 735192904 cmpx 'p[;2]' 'q[;2]' 'r[;2]' p[;2] → 1.88E¯2 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ * q[;2] → 4.05E¯2 | +115% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ * r[;2] → 5.60E¯2 | +197% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ The array compactor should allow optimization
for row-major or column-major access.
Internal Representation: Inverted In contrast, for an inverted table, each column is a simple array, so
Moreover, the data for each column are contiguous in memory, resulting in faster access. We saw earlier that vector data items for an inverted table must be fixed-length.
In a verted table, the overhead for vector data items is 8 bytes for the pointer,
plus 32 bytes for the header.
Therefore, the padded required for fixed-length vectors is “wasteful”
only if the average length is greater than 40.
The remainder of this note presents utilities for working with inverted tables.
If a table has “key” columns, then computations such as index
should be applied to those columns rather than to the entire table.
tassert asserts that the argument is an inverted table: a non-empty vector
whose items have the same number of major cells,
and each item is simple.
⍺ tindex ⍵ indexes row(s) ⍺ from inverted table ⍵ . tindex ← {(⊂⊂⍺)⌷¨⍵} ⍪¨ 3 1 1 2 tindex T ┌──────┬───────┬─┬──┬────┐ │Wilson│Diana │1│23│1.25│ │Jones │Dakota │1│29│0.97│ │Jones │Dakota │1│29│0.97│ │Chan │Wilson │2│47│2.11│ └──────┴───────┴─┴──┴────┘ Index-of and utilities defined in terms of it, require non-tolerant comparisons (if there are floating point data then ⎕ct must be 0). One method for doing index-of is to substitute column i by its indices in column i of T , then compute ⍳ as usual on the transposed tables of integers. S← 3 1 1 2 tindex T ⍉ ↑ T ⍳¨ S 3 3 1 0 0 1 1 1 1 1 1 1 1 1 1 2 2 0 2 2 ⍉ ↑ T ⍳¨ T 0 0 0 0 0 1 1 1 1 1 2 2 0 2 2 3 3 1 0 0 4 4 1 4 4 5 5 0 5 5 6 6 0 6 6 7 7 1 7 7 (⍉ ↑ T ⍳¨ T) ⍳ (⍉ ↑ T ⍳¨ S) 3 1 1 2 tix1 ← {(⍉↑⍺⍳¨⍺) ⍳ (⍉↑⍺⍳¨⍵)} T tix1 S 3 1 1 2 The function 8⌶ is available for doing index-of on inverted tables. (8 “i-beam”; 8 ←→ IIX , Inverted table IndeX, get it? :-) tix ← 8⌶ T tix S 3 1 1 2 The i-beam function is faster than the APL-defined function (tix1), and faster than the equivalent computation on a verted table, which runs at the speed it does only because Dyalog APL recognizes it. Index-of on a general nested matrix is slower. cmpx 'TT tix TT' 'TT tix1 TT' 'XX ⍳ XX' TT tix TT → 9.38E¯4 | 0% ⎕⎕⎕⎕⎕⎕⎕ TT tix1 TT → 4.27E¯3 | +355% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ XX ⍳ XX → 1.61E¯3 | +71% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ QQ←XX ⋄ QQ[9999;]←34 ⍝ QQ is a general nested matrix cmpx 'TT tix TT ' 'XX ⍳ XX' 'QQ ⍳ QQ' TT tix TT → 9.34E¯4 | 0% ⎕⎕⎕ XX ⍳ XX → 1.61E¯3 | +71% ⎕⎕⎕⎕⎕ * QQ ⍳ QQ → 1.03E¯2 | +1004% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ Membership ⍺ teps ⍵ are those indices in ⍵ tix ⍺ that are less than the number of rows in ⍵ . teps← {(≢⊃⍵) > ⍵ tix ⍺} T teps S 0 1 1 1 0 0 0 0 Without ⍺ twithout ⍵ are rows of ⍺ which are not rows of ⍵ . Thus: twithout ← {⍺ ⌿¨⍨ ⊂~⍺ teps ⍵} ⍪¨ T twithout S ┌──────┬───────┬─┬──┬────┐ │Smith │John │2│23│1.25│ │Saxon │Joan │1│31│2.8 │ │Angelo│Roberto│2│19│1.11│ │Smits │Jack │2│27│3.14│ │Franck│Donna │1│38│2.72│ └──────┴───────┴─┴──┴────┘ Unique The unique rows of an inverted table obtains by an analog of the well-known idiom for finding the unique. tunique ← {⍵ ⌿¨⍨ ⊂(⍳≢⊃⍵)=tix⍨ ⍵} T ≡ tunique 2⌿¨T 1 Key ⍺ and ⍵ are inverted tables; rows of ⍺ specify keys for the corresponding rows of ⍵ . Then: ⍺ ⍺⍺ tkey ⍵ applies ⍺⍺ to rows of ⍵ having identical keys. tkey ← {(⊂tix⍨⍺) ⍺⍺⌸¨ ⍵} ⍪¨ T ┌──────┬───────┬─┬──┬────┐ │Smith │John │2│23│1.25│ │Jones │Dakota │1│29│0.97│ │Chan │Wilson │2│47│2.11│ │Wilson│Diana │1│23│1.25│ │Saxon │Joan │1│31│2.8 │ │Angelo│Roberto│2│19│1.11│ │Smits │Jack │2│27│3.14│ │Franck│Donna │1│38│2.72│ └──────┴───────┴─┴──┴────┘ ⍪¨ T[,2] {⊂⍵} tkey T[2 3 4] ┌─────────┬─────────────┬─────────────────────┐ │┌───────┐│┌───────────┐│┌───────────────────┐│ ││2 2 2 2│││23 47 19 27│││1.25 2.11 1.11 3.14││ │├───────┤│├───────────┤│├───────────────────┤│ ││1 1 1 1│││29 23 31 38│││0.97 1.25 2.8 2.72 ││ │└───────┘│└───────────┘│└───────────────────┘│ └─────────┴─────────────┴─────────────────────┘ ⍪¨ T[,2] {⌈/⍵} tkey T[2 3 4] ┌─┬──┬────┐ │2│47│3.14│ │1│38│2.8 │ └─┴──┴────┘ An inverted table can be graded as follows: starting with the identity permutation and proceeding from right to left, derive the next permutation by grading the permutation of the current column. This is “lexicographic grading”. tgr ← {⊃ {⍵[⍋(⊂⍵)⌷⍺]}/ ⍵,⊂⍳≢⊃⍵} tgr T 5 2 7 1 4 0 6 3 ⍪¨ (tgr T) tindex T ┌──────┬───────┬─┬──┬────┐ │Angelo│Roberto│2│19│1.11│ │Chan │Wilson │2│47│2.11│ │Franck│Donna │1│38│2.72│ │Jones │Dakota │1│29│0.97│ │Saxon │Joan │1│31│2.8 │ │Smith │John │2│23│1.25│ │Smits │Jack │2│27│3.14│ │Wilson│Diana │1│23│1.25│ └──────┴───────┴─┴──┴────┘ Grade can also be computed by replacing each column with the rankings within each column, with equal items being assigned equal rankings. Thus: ↑ {(⍋⍋⍵)[⍳⍨⍵]}¨ T 4 2 1 6 3 0 4 6 3 0 7 1 2 6 3 3 0 4 0 4 4 0 0 4 1 5 7 1 6 0 1 1 2 0 6 2 7 1 2 2 ⍋ ⍉ ↑ {(⍋⍋⍵)[⍳⍨⍵]}¨ T 5 2 7 1 4 0 6 3 tgr1 ← {⍋ ⍉ ↑ {(⍋⍋⍵)[⍳⍨⍵]}¨ ⍵} tgr1 T 5 2 7 1 4 0 6 3 Grade and sort can be packaged as a single function: In the monadic case, torder ⍵ computes the grade of ⍵ . In the dyadic case, ⍺ torder ⍵ orders ⍺ by the grade of ⍵ . torder ← {0=⎕nc '⍺':tgr ⍵ ⋄ (tgr ⍵)tindex ⍺} torder T 5 2 7 1 4 0 6 3 ⍪¨ T torder T ┌──────┬───────┬─┬──┬────┐ │Angelo│Roberto│2│19│1.11│ │Chan │Wilson │2│47│2.11│ │Franck│Donna │1│38│2.72│ │Jones │Dakota │1│29│0.97│ │Saxon │Joan │1│31│2.8 │ │Smith │John │2│23│1.25│ │Smits │Jack │2│27│3.14│ │Wilson│Diana │1│23│1.25│ └──────┴───────┴─┴──┴────┘ ⍪¨ T torder T[1 0] ┌──────┬───────┬─┬──┬────┐ │Jones │Dakota │1│29│0.97│ │Wilson│Diana │1│23│1.25│ │Franck│Donna │1│38│2.72│ │Smits │Jack │2│27│3.14│ │Saxon │Joan │1│31│2.8 │ │Smith │John │2│23│1.25│ │Angelo│Roberto│2│19│1.11│ │Chan │Wilson │2│47│2.11│ └──────┴───────┴─┴──┴────┘ The definition of torder can be made more elegant if there is a dyadic operator ⠴ (U+2834) such that
M⠴D ⍵ ←→ M ⍵ ⍺ M⠴D ⍵ ←→ ⍺ D ⍵ whence torder obtains as ⎕ct ← 0 invert ← {↑¨ ⊂⍤¯1 ⍉ ⍵} assert ← {⍺←'assertion failure' ⋄ 0∊⍵:⍺ ⎕signal 8 ⋄ shy←0} tassert ← { assert (1≤≢⍵)∧1=⍴⍴⍵ : ⍝ non-empty vector assert (⊃=⊢)≢¨⍵ : ⍝ equal tally in each item assert 2=≡⍵ : ⍝ nested array with simple items 1 } tindex ← {(⊂⊂⍺)⌷¨⍵} tix1 ← {(⍉↑⍺⍳¨⍺)⍳(⍉↑⍺⍳¨⍵)} tix ← 8⌶ teps ← {(≢⊃⍵) > ⍵ tix ⍺} twithout ← {⍺ ⌿¨⍨ ⊂~⍺ teps ⍵} tunique ← {⍵ ⌿¨⍨ ⊂(⍳≢⊃⍵)=tix⍨ ⍵} tkey ← {(⊂tix⍨⍺) ⍺⍺⌸¨ ⍵} tgr ← {⊃ {⍵[⍋(⊂⍵)⌷⍺]}/ ⍵,⊂⍳≢⊃⍵} tgr1 ← {⍋ ⍉ ↑ {(⍋⍋⍵)[⍳⍨⍵]}¨ ⍵} torder ← {0=⎕nc '⍺':tgr ⍵ ⋄ (tgr ⍵)tindex ⍺} Translated and adapted from the J Wiki Essay Inverted Table, 2007.
|