compact representation of gradeup petmutation vector

APL-related discussions - a stream of APL consciousness.
Not sure where to start a discussion ? Here's the place to be
Forum rules
This forum is for discussing APL-related issues. If you think that the subject is off-topic, then the Chat forum is probably a better place for your thoughts !

compact representation of gradeup petmutation vector

Postby tclviii-dyalog on Mon Feb 01, 2016 4:42 pm

b

1010data started as database as a service about 15 or 20 years ago, and is now "big data in the cloud" ... the underlying language is k3.

I was at a kx demo where 1010data spoke. the demo was on a 73 billion row table.

the speaker said something like, "I don't have to explain the power of vector languages to you. since we can keep the resulting permutation vector of a grade up there are all sorts of things we can do" ... at which point i raised my hand

"uhm, 73 billion 64 bit floats is a big array to keep around. even if it wasn't in memory it would take a long time to write to disk."

the answer was something like "obviously we don't use a couple hundred gigabytes of floating permutation vector, we use bit maps"

which leaves me totally confused. anyone have any idea of some bit map compression (or compact representation) of what essentially amounts to an arbitrary re-ordering of iota 73billion
tclviii-dyalog
 
Posts: 28
Joined: Tue Mar 02, 2010 6:04 pm

Re: compact representation of gradeup petmutation vector

Postby Roger|Dyalog on Mon Feb 01, 2016 10:19 pm

- You could have (should have) asked the speaker for more details.

- The table had LOTS of duplicate rows. Even then it's dicey, because a single bit vector with 73e9 entries is already 9 GB.

- The ordering is not random but has structure that can be exploited. e.g. Nobody creates a 73e9 row table from scratch. Therefore there is a large existing table, already ordered, and you just need to know where in the large table to slot in some small number of new rows.

- The table had 73 MILLION rows rather than 76 billion rows. :-)
Roger|Dyalog
 
Posts: 238
Joined: Thu Jul 28, 2011 10:53 am

Re: compact representation of gradeup petmutation vector

Postby paulmansour on Thu Feb 04, 2016 5:23 pm

I'm fairly sure that they block their data. A multi-billion row table is spread out over multiple machines. Some form of map/reduce is used to process manageable chunks on multiple machines, which are then aggregated back on the controlling machine. Furthermore, if it is time-series data, it will be sharded or blocked by time, so the the table is essentially presorted. There will never be a 73 billion item vector of any sort in use.
paulmansour
 
Posts: 420
Joined: Fri Oct 03, 2008 4:14 pm


Return to APL Chat

Who is online

Users browsing this forum: Google [Bot] and 1 guest