4

Let's say I want to combine several massive files into one and then uniq! the one (THAT alone might take a hot second)

It's my understanding that File.readlines() loads ALL the lines into memory. Is there a way to read it line by line, sort of like how node.js pipe() system works?

webster
  • 3,902
  • 6
  • 37
  • 59
dsp_099
  • 5,801
  • 17
  • 72
  • 128

4 Answers4

6

One of the great things about Ruby is that you can do file IO in a block:

File.open("test.txt", "r").each_line do |row|
  puts row
end               # file closed here

so things get cleaned up automatically. Maybe it doesn't matter on a little script but it's always nice to know you can get it for free.

seph
  • 6,066
  • 3
  • 21
  • 19
3

you aren't operating on the entire file contents at once, and you don't need to store the entirety of each line either if you use readline.

file = File.open("sample.txt", 'r')
while !file.eof?
   line = file.readline
   puts line
end
Muaaz Rafi
  • 1,469
  • 2
  • 15
  • 23
  • If this is the case, why does loading a 350Mb file with "readlines" takes like 5 seconds? I just assumed it is "preloading" the array – dsp_099 Aug 19 '16 at 05:23
  • Yeah so I had the same problem but the readline saved a lot of my time. If this works please accept answer. – Muaaz Rafi Aug 19 '16 at 05:25
1

Large files are best read by streaming methods like each_line as shown in the other answer or with foreach which opens the file and reads line by line. So if the process doesn't request to have the whole file in memory you should use the streaming methods. While using streaming the required memory won't increase even if the file size increases opposing to non-streaming methods like readlines.

File.foreach("name.txt") { |line| puts line }

sugaryourcoffee
  • 879
  • 8
  • 14
1

uniq! is defined on Array, so you'll have to read the files into an Array anyway. You cannot process the file line-by-line because you don't want to process a file, you want to process an Array, and an Array is a strict in-memory data structure.

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653