2

In ExcelJs package there is dataValidation or _dataValidation attribute for cell object. This gives the type of data in cell like list, dropdown, decimal, boolean etc.

But, for some reason, it is throwing Javascript out of heap memory error for some files, So, I have increased the memory limit to 16GB (it is the max I can use) by node --max-old-space-size=[size_in_GB*1024] index.js but it failed to parse.

So, I am looking for alternative packages like sheet js and others but I coun't get all the data validations as in ExcelJs mainly options in a dropdown of an Excel file.

Please help me to solve this issue. Thank you.

PS: I have checked this question How to Get the DataType of excel cell value using nodejs version16.14.2 and Vue js and the convert-excel-to-json don't provide any validations on the cell.

I am getting this error while parsing the particular Excel file.

Error image

This is the code snippet that I have used and I got the above error after few minutes.

const ExcelJS = require('exceljs');
const fs = require("fs");

async function readExcelFile(filePath) {
  const workbook = new ExcelJS.Workbook();
  const stream = fs.createReadStream(filePath);

  await workbook.xlsx.read(stream);

  const worksheet = workbook.getWorksheet(1);

  worksheet.eachRow((row, rowNumber) => {
    
      row.eachCell((cell, colNumber) => {
        console.log(cell.dataValidations, cell._dataValidations);
        console.log(`Cell value at row ${rowNumber}, column ${colNumber}: ${cell.value}`);
      });
    });
}

readExcelFile('PAG KIC TAB A (1)(1).xlsx');
  • I have used many xlsx libs and exceljs was the best one of them, so let's figure it out. Please provide the `dataValidation` in question and ideally stack trace. – Dan Macak May 22 '23 at 13:16
  • No worries. So you are reading some xlsx files and it runs out of memory. Did you identify which file and ideally which `dataValidation` causes it? Paste the `dataValidation` into your question if you find it please. – Dan Macak May 23 '23 at 06:47
  • this is the [file](https://docs.google.com/spreadsheets/d/15Fc9s4zjT8V-MQvtrDjkJ4Acp8imh8xL/edit?usp=sharing&ouid=101228691583733762819&rtpof=true&sd=true), I didn't even get the parsed output. –  May 23 '23 at 09:55
  • I was able to parse this file just fine. Please add which version of exceljs and electron you are using, perhaps also code snippet of how you read and parse the file. What makes you think it's the `dataValidation` that causes the OOM? – Dan Macak May 23 '23 at 17:43
  • I have added the code snippet. Actually, while parsing itself I got the heap memory out error. So, I didn't get any results. –  May 25 '23 at 13:15
  • Thanks, I reproduced the issue. There is something wrong with that file, I can't even open it in One Drive. When I open it locally, it says there are some invalid links. Just for the testing, can you break the links and run you parsing again? https://support.microsoft.com/en-us/office/break-a-link-to-an-external-reference-in-excel-f1ca8b08-4f24-4af6-92e5-f4fdb1442748 – Dan Macak May 25 '23 at 16:56
  • Yes, I know without external links it works fine. There are more than 100 external links. But, as it will be uploaded by the user, I just wanted to check if it is possible to resolve this. –  May 29 '23 at 08:33

1 Answers1

0

I had a look why the OOM happens and the culprit is actually a defined name (MS docs, ExcelJS docs) from a sheet named "[31]IRRs UPDATE EACH QUARTER". It references a huge range "C6:XFD1048576" which ExcelJS goes through cell by cell and by doing so uses up all memory. No wonder your workbook doesn't open in MS Office cloud.

Now depending on what kind of processing you need, you can either inspect that offending sheet whether this defined name with such huge range makes even sense, fix/remove it and try again.

Or you can use ExcelJS's Streaming API, like this:

import ExcelJS from "exceljs";

async function readExcelFile(filePath: string) {
  const workbookReader = new ExcelJS.stream.xlsx.WorkbookReader(filePath, {});

  for await (const worksheetReader of workbookReader) {
    for await (const row of worksheetReader) {
      row.eachCell((cell) => {
        console.log(cell.value);
      });
    }
  }
}

readExcelFile("PAG KIC TAB A (1)(1).xlsx");

As is the case with streaming in general, it doesn't buffer all data in memory, but rather only processes one batch of data at a time, and is therefore much more efficient with memory usage. But I also found that this way of reading doesn't process the defined names at all, which can be a no-go for you.

Dan Macak
  • 16,109
  • 3
  • 26
  • 43