4

I am using Axlsx to create an excel file. For a small dataset, it works fine. But once the dataset gets big, it just hangs. I ran strace on the process, it was doing a lot brk.

a = Axlsx::Package.new
book = a.workbook
book.add_worksheet(:name => "test") do |sheet|

  input_array.each do |input_data|
     ...# covert input_data to row_data
     sheet.add_row(row_data)
  end
end
File.open("testfile", 'w') { |f| f.write(p.to_stream().read) }

My input_array size is about 400,000, so the worksheet has 400,000 rows, quite large. It got stuck at p.to_stream().read. Any help would be great. Thanks.

Lomefin
  • 1,173
  • 12
  • 43
user612308
  • 1,813
  • 5
  • 23
  • 22

2 Answers2

5

Looks like I need to start paying attention to SO! This is randym (the author of axlsx)

There are a couple of things I'd like to point out that should help you get what you need done, well... done!

  1. If you are writing to a file, consider Package#serialize - not because it is faster, but because it is less code for you to maintain.

    p.serialize 'filename.xlsx'

  2. Major performance improvements have been made over the last few weeks. Please upgrade to 1.1.1 The gem no longer depends on RMagic, and use_autowidth = false is no longer required.

https://github.com/randym/axlsx

Benchmarks for master:

Benchmarks w/40k rows:
                            user     system      total        real
axlsx_noautowidth      68.130000   1.690000  69.820000 ( 80.257108)
axlsx                  61.520000   2.290000  63.810000 ( 78.187423)
axlsx_shared           53.280000   1.170000  54.450000 ( 62.880780)
axlsx_stream           52.110000   1.360000  53.470000 ( 61.980672)
csv                    10.670000   0.930000  11.600000 ( 14.901387)

Benchmarks w/4k rows:
                            user     system      total        real
axlsx_noautowidth       4.880000   0.120000   5.000000 (  5.314383)
axlsx                   5.470000   0.110000   5.580000 (  5.853739)
axlsx_shared            5.720000   0.080000   5.800000 (  6.135263)
axlsx_stream            4.840000   0.090000   4.930000 (  5.194801)
csv                     1.090000   0.090000   1.180000 (  1.484763)

Here is the benchmarking file:

https://gist.github.com/2411144

Hope this helps

randym
  • 2,430
  • 1
  • 19
  • 18
  • Hi randym, I am trying to use axlsx in my Rails app. I have used this gem in a separate ruby program which works great. The program is: p = Axlsx::Package.new p.workbook.add_worksheet(:name => "sample_dataset") do |sheet| dataset.map {|row| sheet.add_row row} end p.use_shared_strings = true # p.serialize('/Users/Rakib/Desktop/New File_files/simple.xlsx') p.serialize('simple.xlsx') But when I try to use the same ruby code in my Rails app I keep getting error. question in SO: http://stackoverflow.com/questions/23986981/uninitialized-constant-zipdostime-error-when-using-axlsx-gem-in-rails-app – K M Rakibul Islam Jun 02 '14 at 05:20
  • 1
    Also, I have found few blog posts about using axlsx in Rails e.g. http://pramodbshinde.wordpress.com/2013/12/29/design-spreadsheets-using-axlsx-in-rails/ They showed that pacakege.serialize method does not take any param. But this is not working for me in my app currently. Also, after some debugging I found out that, p.serialize("#{Rails.root}/tmp/basic.xlsx") this line basically fails in my app. could you please tell my why this serialization is not working in my Rails app? I would really appreciate if you could point out the problem. Thanks in advance randym. – K M Rakibul Islam Jun 02 '14 at 05:24
  • But what of another order of magnitude? The OP was for 400k rows, not 40k. I'm also finding limitations here @ 400k rows in my application. What are the limits for axlsx? – Jimi Kimble Nov 02 '15 at 15:05
2

You can try to disable RMagick, which handles column autowidth feature, since it's quite heavy process AFAIK.

a = Axlsx::Package.new   
a.use_autowidth = false
Nash Bridges
  • 2,350
  • 14
  • 19
  • I am getting this error: undefined method `use_autowidth=' for # (NoMethodError). i am using axlsx-1.0.18. I went through Axlsx source code, it seem use_autowidth is not defined. – user612308 Apr 09 '12 at 14:04
  • @user612308 Yeah, it was added [a bit later](https://github.com/randym/axlsx/commit/26a8ad445d84b9dcb9fc36702ec761603a74ee20) and released in 1.1. – Nash Bridges Apr 09 '12 at 15:21
  • thanks Nash. tried, still the same issue. i think the issue is not with use_autowidth, it has something to do with how to_stream() is implemented. right now, i don't have good solutions. i have to set a limit for the number of rows i can add. – user612308 Apr 10 '12 at 08:28