0

There is a program that generates huge CSV files. For example:

arr = (0..10).to_a
CSV.open("foo.csv", "wb") do |csv|
  (2**16).times { csv << arr }
end

It will generate a big file, so I want to be compressed on-the-fly, and, instead of output a non-compressed CSV file (foo.csv), output a bzip-compressed CSV file (foo.csv.bzip).

I have an example from the "ruby-bzip2" gem:

writer = Bzip2::Writer.new File.open('file')
writer << 'data1'
writer.close

I am not sure how to compose Bzip2 write from the CSV one.

osgx
  • 90,338
  • 53
  • 357
  • 513
Israel
  • 3,252
  • 4
  • 36
  • 54
  • I'm not sure about what are you asking: do you want to eliminate the bzip file creation or the csv file creation? or other? – mdesantis Apr 23 '14 at 21:00

2 Answers2

3

You can also construct a CSV object with an IO or something sufficiently like an IO, such as a Bzip2::Writer.

For example

File.open('file.bz2', 'wb') do |f|
  writer = Bzip2::Writer.new f
  CSV(writer) do |csv|
    (2**16).times { csv << arr }
  end
  writer.close
end
Frederick Cheung
  • 83,189
  • 8
  • 152
  • 174
3

Maybe it would be more flexible to write the CSV data to stdout:

# csv.rb
require 'csv'
$stdout.sync = true

arr = (0..10).to_a
(2**16).times do
  puts arr.to_csv
end

... and pipe the output to bzip2:

$ ruby csv.rb | bzip2 > foo.csv.bz2
Stefan
  • 109,145
  • 14
  • 143
  • 218
  • Of course, this is a pretty solution, but I need to do all processing with ruby. You solution using bash is pretty good. Anyway thank you. – Israel Apr 23 '14 at 22:20
  • 1
    *WHY* does all processing have to be done in Ruby? It's a more standard (which means well-tested), and flexible, practice to pipe the output into a compressing app. – the Tin Man Apr 23 '14 at 23:42