Exploiting the streaming options of ⎕S

General APL language issues

Exploiting the streaming options of ⎕S

Postby paulmansour on Sat May 27, 2017 2:54 pm

⎕NPUT and ⎕NGET are great, but they do not do blocking, and sometimes we need to deal with huge text files.

The streaming options for ⎕S and ⎕R offer some intriguing possibilities. Consider counting the number of rows or lines in a file:

      CountRows←{
⍝ Count Rows of text file.
t←⍵ ⎕NTIE 0
v←'ML' 1
r←≢('^'⎕S 0⍠v)t
r⊣⎕FUNTIE t
}


Not the fasting thing in the world, but no WS FULL with a huge file. Obviously we could code up some looping function using ⎕NREAD, but having to check for newline types, blocking , and record fragments, is a pain, and ⎕S does all of this for us. I assume I can do this with ⎕CSV too, using blocking, but I would still have to write a loop around it I think.

I'm not sure how much of the time is just reading the file, and how much the regex '^' consumes. This is a trivial search pattern, and I wonder if their is an opportunity for optimization here, to bypass PCRE, when you just want to iterate through the rows. It strikes me there may be many useful application of iterating through the lines of a text file.

One such is simply to read the whole file in blocks, processing large chunks at time (Again ⎕CSV will let me do this). While ⎕S will start searching at the current file position, there is no way, as far as I can tell, to stop reading after processing n rows. I thought mixed mode might do it, as I can still search for start of a line, and the match limit applies to the whole doc, but, like document mode, it appear it must read in the entire file at one time, and that defeats the purpose.

Anyway, interesting stuff.
paulmansour
 
Posts: 420
Joined: Fri Oct 03, 2008 4:14 pm

Return to Language

Who is online

Users browsing this forum: No registered users and 1 guest