Exploiting the streaming options of ⎕S

by **paulmansour** on Sat May 27, 2017 2:54 pm

⎕NPUT and ⎕NGET are great, but they do not do blocking, and sometimes we need to deal with huge text files.

The streaming options for ⎕S and ⎕R offer some intriguing possibilities. Consider counting the number of rows or lines in a file:

      CountRows←{
           ⍝ Count Rows of text file.
           t←⍵ ⎕NTIE 0
           v←'ML' 1
           r←≢('^'⎕S 0⍠v)t
           r⊣⎕FUNTIE t
       }

Not the fasting thing in the world, but no WS FULL with a huge file. Obviously we could code up some looping function using ⎕NREAD, but having to check for newline types, blocking , and record fragments, is a pain, and ⎕S does all of this for us. I assume I can do this with ⎕CSV too, using blocking, but I would still have to write a loop around it I think.

I'm not sure how much of the time is just reading the file, and how much the regex '^' consumes. This is a trivial search pattern, and I wonder if their is an opportunity for optimization here, to bypass PCRE, when you just want to iterate through the rows. It strikes me there may be many useful application of iterating through the lines of a text file.

One such is simply to read the whole file in blocks, processing large chunks at time (Again ⎕CSV will let me do this). While ⎕S will start searching at the current file position, there is no way, as far as I can tell, to stop reading after processing n rows. I thought mixed mode might do it, as I can still search for start of a line, and the match limit applies to the whole doc, but, like document mode, it appear it must read in the entire file at one time, and that defeats the purpose.

Anyway, interesting stuff.

The tool of thought for

software solutions

Exploiting the streaming options of ⎕S

Exploiting the streaming options of ⎕S

Who is online

QUICK LINKS