6

I've got a parser written using ruby's standard StringScanner. It would be nice if I could use it on streaming files. Is there an equivalent to StringScanner that doesn't require me to load the whole string into memory?

jes5199
  • 18,324
  • 12
  • 36
  • 40

3 Answers3

1

You might have to rework your parser a bit, but you can feed lines from a file to a scanner like this:

File.open('filepath.txt', 'r') do |file|
  scanner = StringScanner.new(file.readline)
  until file.eof?
    scanner.scan(/whatever/)
    scanner << file.readline
  end
end
mckeed
  • 9,719
  • 2
  • 37
  • 41
  • 4
    I know this is years later but that still reads the whole file into memory. Once you've reached eof the "scanner" is holding the full copy of the file... (It doesn't release anything after the string pointer moves past the contents) – Sam Stelfox Aug 09 '13 at 20:39
0

StringScanner was intended for that, to load a big string and going back and forth with an internal pointer, if you make it a stream, then the references get lost, you can not use unscan, check_until, pre_match, post_match, well you can, but for that you need to buffer all the previous input.

If you are concerned about the buffer size then just load by chunk of data, and use a simple regexp or a gem called Parser. The simplest way is to read a fix size of data.

# iterate over fixed length records
open("fixed-record-file") do |f|
  while record = f.read(1024)
    # parse here the record using regexp or parser
  end
end

[Updated]

Even with this loop you can use StringSanner, you just need to update the string with each new chunk of data:

string=(str)

Changes the string being scanned to str and resets the scanner. Returns str

Community
  • 1
  • 1
Snake Sanders
  • 2,641
  • 2
  • 31
  • 42
  • The reference to the Parser gem (which is [whitequark/parser](https://github.com/whitequark/parser/tags) is irrelevant, since Parser is a **Ruby** parser (in Ruby). – akim Jan 17 '21 at 09:08
-1

There is StringIO.

Sorry misread you question. Take a look at this seems to have streaming options

Bobby
  • 11,419
  • 5
  • 44
  • 69