4

For the purpose of reading a text file line by line, without loading the entire file into memory, what is the common way to do this in Rebol?

I am doing the following, but I think (correct me if I'm wrong) that it loads the whole file into memory first:

foreach line read/lines %file.txt [ print line ]
mydoghasworms
  • 18,233
  • 11
  • 61
  • 95

2 Answers2

5

At least with Rebol2

read/lines/direct/part %file.txt 1 

should come near to what you want

but if you want all lines one line after the other, it should be like

f: open/lines/direct %test.txt
while [l: copy/part f 1] [print l]

In theory you can supersede any function, even natives. I will try to give a new foreach

foreach_: :foreach
foreach:  func [
    "Evaluates a block for each value(s) in a series or a file for each line."
    'word [get-word! word! block!] {Word or block of words to set each time (will be local)}
    data [series! file! port!] "The series to traverse"
    body [block!] "Block to evaluate each time"
    /local port line
] [
    either any [port? data   file? data] [
        attempt [
            port: open/direct/lines data
            while [line:  copy/part port 1] [
                set :word line
                do :body 
                line
            ]
        ] 
        attempt [close port]
    ] [
        foreach_  :word :data :body
    ]
]

Probably the set :word line part and the attempt should be more elaborated in order to avoid name clashes and get meaningful errors.

sqlab
  • 6,412
  • 1
  • 14
  • 29
  • But doesn't `open/lines/direct` load the whole file into memory? That's what I'm trying to avoid, as per my question. – mydoghasworms Jan 14 '15 at 12:22
  • 1
    No, 'open does just that - opens the file. Whereas 'read reads it. Well, maybe 'open bufferst at some modes, but there is also 'seek mode you could utilise - http://www.rebol.com/article/0199.html – pekr Jan 14 '15 at 12:29
  • 2
    From personal experience I can confirm, that open/lines/direct can use less memory than open/lines. But you will see the difference just, if you open bigger files. Probably there is a minimal internal buffer used, that will be always filled first. – sqlab Jan 14 '15 at 13:28
  • I use open/direct/lines on some huge (bigger than memory) files and it just works as expected, doesn't load whole file into memory. – endo64 Jan 15 '15 at 09:27
  • Like I said in my comment to draegtun: I was hoping to do something like the following in Ruby: `open('file.txt').each {|line| puts line}`. I guess one could write a variant of the `foreach` function in Rebol to do something similar that accepts a file or port. Can you change the behaviour of standard functions otherwise? (E.g. let `foreach` accept a `port!`) ? – mydoghasworms Jan 19 '15 at 10:38
  • In theory you can supersede any function even natives. I will try to give a new foreach. – sqlab Jan 19 '15 at 12:39
2

Yes open is the way to go. However like sqlab touches on the necessary /lines & /direct refinements are not present in Rebol 3 open (yet).

The good news though is that you can still use open to read in large files in Rebol 3 without these refinements...

file: open %movie.mpg
while [not empty? data: read/part file 32000] [
    ;
    ; read in 32000 bytes from file at a time
    ; process data
]
close file

So you just need to wrap this up into a buffer and process a line at a time.

Here's a crude working example I've put together:

file: open/read %file.txt
eol: newline
buffer-size: 1000
buffer: ""
lines: []

while [
    ;; start buffering
    if empty? lines [
        ;; fill buffer until we have eol or EOF
        until [
            append buffer to-string data: read/part file buffer-size
            any [
                empty? data
                find buffer eol
            ]
        ]
        lines: split buffer eol
        buffer: take/last lines
    ]

    line: take lines
    not all [empty? data empty? buffer]
  ][
    ;; line processing goes here!
    print line
]

close file
draegtun
  • 22,441
  • 5
  • 48
  • 71
  • That's quite a lot of code. I was hoping to do something like the following in Ruby: `open('file.txt').each {|line| puts line}`. I guess one could write a variant of the `foreach` function in Rebol to do something similar that accepts a file or port. Can you change the behaviour of standard functions otherwise? (E.g. let `foreach` accept a `port!` ? – mydoghasworms Jan 19 '15 at 10:37