2

I've got a problem with converting big XLSX which is over 600mb to CSV. The thing is that with smaller files(>3MB) it's fine, but when it comes to big files then I can see how it eats up the whole memory and then just create an empty test.csv file.

So far I used this module: node-xlsx
I follow the guide here: Convert XLS to CSV on the server in Node

T2Admin
  • 147
  • 2
  • 14
  • Just curious why are you trying to parse it to a csv? Do you have an example of your xls and what your would like the output to look like in the csv? – tilted Nov 29 '16 at 23:29
  • xlsx is just simple sheet with a lots of columns and some data in them, no graphs or any visual object in there. – T2Admin Nov 30 '16 at 05:44

3 Answers3

0

The guide you are following works fine for smaller files. However XLSX is the most famous one.

But it will also fail for large files due to restriction on Node RAM usage.

You can increase the allowed RAM by --max_old_space_size=4096 where 4096=4GB

Cybersupernova
  • 1,833
  • 1
  • 20
  • 37
0

These answers are not solving the memory issue.

If you load entire Excel and or Array in memory it can get take up a lot of memory, often more then 1gb.

The way to deal with extremely large datasets (1 million+) and not crash your server is not load things in the memory and use streams:

        import Excel from 'exceljs';

        const workbook_write_stream = new Excel.stream.xlsx.WorkbookWriter({
          filename: filePath,
        });

        const sheet = workbook_write_stream.addWorksheet('Data');
    
        sheet.addRow(rowHeaders).commit();
    
        for (let i = 1; i <= pageCount; i++) {
            const data = await this.makePostRequest(searchId, i);

    
            const rows = data.records;
    
            for (const row of rows) {
              const formattedRow = [];
    
              rowHeaders.forEach((header) => {
                formattedRow.push(row[header]);
              });
    
            }

        }
    
        sheet.commit();
        await workbook_write_stream.commit();
Eric Aya
  • 69,473
  • 35
  • 181
  • 253
Tosh
  • 1,789
  • 15
  • 20
-1

My main issue with the big sheet not working was that I was on node: v7.10.1 (with npm 4.2.0) after I upgraded to node: v14.17.0 (with npm 6.14.13). I also used --max-old-space-size=4096 when running my node script.

Node XLSX

I only started using this one after SheetJS was outputting no data because of the larger sheets. I used this solution with PapaParse because I find it easy to use.

const xlsx = require('node-xlsx');
const Papa = require('papaparse');
const fs = require('fs');

var obj = xlsx.parse(inputPath);

for(var i = 0; i < obj.length; i++) {
    var sheet = obj[i];
    fs.writeFile(`${outputPath}.${i}.csv`, Papa.unparse(sheet.data));
}

SheetJS js-xlsx

I'm still including this solution because it worked nicely up until the issues with larger sheets, I also liked that it had CLI commands.

CLI command for converting excel to CSV:

xlsx --book -o 'output.csv' 'input.xlsx'

Code to do the same in node

const XLSX = require('xlsx');

const workBook = XLSX.readFile(inputPath);
workBook.SheetNames.forEach((sheet, idx) => {
    XLSX.writeFile(workBook, `${outputPath}.${idx}.csv`, {sheet: sheet, bookType: "csv"});
});
Gabe Gates
  • 902
  • 1
  • 14
  • 19
  • it's not solving the memory issue. – Tosh Feb 28 '23 at 16:08
  • Memory issues can be tough, have you tried allocating more memory to your node script via the `--max-old-space-size` option – Gabe Gates Feb 28 '23 at 19:50
  • no it's not the point to add more memory, you have to rewrite in a way that it doesnt exceed a ceratin memory, that can be done with writing directly to file or using streams – Tosh Feb 28 '23 at 20:43
  • I agree, this is not a solution to the original question. It was only a solution to my personal project. In my situation *PapaParse* helped perform my task more efficiently than *SheetJS* solving my issue, I left the answer to help others start down this path and provide an alternative library to look into. I hope someone posts a better solution. – Gabe Gates Feb 28 '23 at 20:51