How to split a big ODS file without causing memory leaks?

Question

I'm working with a MYSQL database, and have two types of files to import: First one is a CSV file that I can use

LOAD DATA INFILE 'path-to-csv_file'

The second type of file is ODS (OpenDocument Spreadsheet) that MYSQL doesn't support for LOAD DATA INFILE. My solution was to convert ODS to CSV using xlsx package that have a XLSX.readfile command and then using csv-writer. But, when working with large ODS files, my program was crashing cause it was using to much memory. I searched for solutions and found streams but xlsx package doesn't have read streams. After this, I tried to use fs cause it has a fs.createReadStream command, but this module doesn't support ODS files. An example is comparing both returns in fs.readFile and xlsx.readFile.

fs.readFile:

PK♥♦m�IQ�l9�.mimetypeapplication/vnd.oasis.opendocument.spreadsheetPK♥♦m�IQM◄ˋ%�%↑Thumbnails/thumbnail.png�PNG
→
IHDR�♥A�-=♥PLTE►►☼§¶►∟↓*.!/<22/8768:G6AN>AM>BP>MaC:;A?GOE?EFJGJRJQ[TJEQOQ\QJYWYKVeX\dX]p\bkXetaNJgTEe[Wp^Wa_aja\ue\hfgektjqztkeqnpyqlwwvco�jw�j}�v{�q⌂�~�⌂{��t��t��u��z��y��|��{��{��}���o]�od�vj�|v�⌂n�⌂r��{��n��x��~��~������

XLSX.readFile:

  J323: { t: 's', v: '79770000', w: '79770000' },
  K323: { t: 's', v: '20200115', w: '20200115' },

Working with XLSX module is easy, cause I can pick up only the data that I want in this ODS file. Using a javascript code, I extract three columns and put it in an array:

const xlsx = require('xlsx');
let posts = [];
let post = {};
for(let i = 0; i < 1; i++){
    let filePath = `C:\\Users\\me\\Downloads\\file_users.ODS`;
    
    let workbook = xlsx.readFile(filePath);
    let worksheet = workbook.Sheets[workbook.SheetNames[0]];


    for (let cell in worksheet) {
        const cellAsString = cell.toString();
        cellAsString[0] === 'A' ? post['ID'] = worksheet[cell].v :
            cellAsString[0] === 'C' ? post['USER NAME'] = worksheet[cell].v : null;
    
        if (cellAsString[0] === 'J') {
            post['USER EMAIL'] = worksheet[cell].v;
            Object.keys(post).length == 3 ? posts.push(post) : null;
            post = {}
        }
    }
}

...returns:

{
    ID: '1',
    'USER NAME': 'John Paul',
    'USER EMAIL': 'Paul.John12@hotmail.com'
  },
  {
    ID: '2',
    'USER NAME': 'Julia',
    'USER EMAIL': 'lejulie31312@outlook.com'
  },
  {
    ID: '3',
    'USER NAME': 'Greg Norton',
    'USER EMAIL': 'thenorton31031@hotmail.com'
  },
  ... 44660 more items

So, my problem is when working with large ODS files. The return above is when using this script with 78MB file, and is using 1.600MB of RAM. When I try to use this with 900MB files, my memory reaches the limit (4000MB+) and I got the error: 'ERR_STRING_TOO_LONG'

I tried to use readline package for parse the data, but it needs a stream.

If I have to slice the ODS files into small pieces, how could I read the file for this without crashing my vs code?

How to split a big ODS file without causing memory leaks?

0 Answers0