1

I want to make use of the LightCells API within Aspose.Cells to help optimize my read and write processes while also reducing the memory utilization while processing the excel file.

I have queries regarding the implementation of this API. How can I use light cell APIs - LightCellsDataProvider and LightCellsDataHandler - to efficiently read and write Excel through less memory utilization. I specifically want to load the data of the cells in memory in chunks (for e.g. processing only the first 100 rows and then the next 100 and so on) so that less memory will be consumed in the entire process.

Want to implement below code but using lightCell api in aspose-

public static void createTemplate( String originalFilePath, List<String> originalFileSheetNameList, String templateFilePath ) throws Exception {    
        LoadOptions opts = new LoadOptions();
    opts.setMemorySetting(MemorySetting.MEMORY_PREFERENCE);
    Workbook wb = new Workbook(originalFilePath, opts);
    WorksheetCollection wsCollections = wb.getWorksheets();
        for(String sheetName: originalFileSheetNameList) {
            Worksheet sheet = wsCollections.get(sheetName);
            sheet.getCells().deleteRows(1, sheet.getCells().getMaxDataRow());
        }

    FileOutputStream outputStream = new FileOutputStream(templateFilePath);
    wb.save(outputStream, SaveFormat.XLSX);
    outputStream.close();
   
}

Below code is the Aspose.Cells light cell api implementation. Using LightCells how can I implement above code -

public static void createTemplate(String originalFilePath, List<String> originalFileSheetName, String templateFilePath ) throws Exception { 
        LoadOptions opts = new LoadOptions();
    LightCellsDataHandlerVisitCells v = new LightCellsDataHandlerVisitCells();
    opts.setLightCellsDataHandler((LightCellsDataHandler) v);
    Workbook wb = new Workbook(originalFilePath, opts);
    int sheetCount = wb.getWorksheets().getCount();
    v.startSheet(wb.getWorksheets().get("System_Data"));
    System.out.println("Total sheets: " + sheetCount + ", cells: " + v.cellCount + ", strings: " + v.stringCount
            + ", formulas: " + v.formulaCount);
   
}

For reading -

import com.aspose.cells.Cell; 
import com.aspose.cells.CellValueType; 
import com.aspose.cells.LightCellsDataHandler; 
import com.aspose.cells.Row; 
import com.aspose.cells.Worksheet;

public class LightCellsDataHandlerVisitCells implements LightCellsDataHandler { 
public int cellCount; 
public int formulaCount; 
public int stringCount;

public LightCellsDataHandlerVisitCells() {
    this.cellCount = 0;
    this.formulaCount = 0;
    this.stringCount = 0;
}

public int cellCount() {
    return cellCount;
}

public int formulaCount() {
    return formulaCount;
}

public int stringCount() {
    return stringCount;
}

public boolean startSheet(Worksheet sheet) {
    System.out.println("Processing sheet[" + sheet.getName() + "]");
    System.out.println("Processing sheet rows[" + sheet.getCells().getMaxDataRow() + "]");
    return true;
}

public boolean startRow(int rowIndex) {
    return true;
}

public boolean processRow(Row row) {
    return true;
}

public boolean startCell(int column) {
    return true;
}

public boolean processCell(Cell cell) {
    this.cellCount = this.cellCount + 1;
    cell.getRow();
    if (cell.isFormula()) {
        this.formulaCount = this.formulaCount + 1;
    } else if (cell.getType() == CellValueType.IS_STRING) {
        this.stringCount = this.stringCount + 1;
    }
    return false;
}
}

getting print -1 on System.out.println("Processing sheet rows[" + sheet.getCells().getMaxDataRow() + "]");

For writing -

public class LightCellsDataProviderDemo implements LightCellsDataProvider { 

private final int sheetCount; 
private final int maxRowIndex; 
private final int maxColIndex; 
private int rowIndex; 
private int colIndex; 
private final Style style1; 
private final Style style2;

public LightCellsDataProviderDemo(Workbook wb, int sheetCount, int rowCount, int colCount) {
    // set the variables/objects
    this.sheetCount = sheetCount;
    this.maxRowIndex = rowCount - 1;
    this.maxColIndex = colCount - 1;

    // add new style object with specific formattings
    style1 = wb.createStyle();
    Font font = style1.getFont();
    font.setName("MS Sans Serif");
    font.setSize(10);
    font.setBold(true);
    font.setItalic(true);
    font.setUnderline(FontUnderlineType.SINGLE);
    font.setColor(Color.fromArgb(0xffff0000));
    style1.setHorizontalAlignment(TextAlignmentType.CENTER);

    // create another style
    style2 = wb.createStyle();
    style2.setCustom("#,##0.00");
    font = style2.getFont();
    font.setName("Copperplate Gothic Bold");
    font.setSize(8);
    style2.setPattern(BackgroundType.SOLID);
    style2.setForegroundColor(Color.fromArgb(0xff0000ff));
    style2.setBorder(BorderType.TOP_BORDER, CellBorderType.THICK, Color.getBlack());
    style2.setVerticalAlignment(TextAlignmentType.CENTER);
}

public boolean isGatherString() {
    return false;
}

public int nextCell() {
    if (colIndex < maxColIndex) {
        colIndex++;
        return colIndex;
    }
    return -1;
}

public int nextRow() {
    if (rowIndex < maxRowIndex) {
        rowIndex++;
        colIndex = -1; // reset column index
        if (rowIndex % 1000 == 0) {
            System.out.println("Row " + rowIndex);
        }
        return rowIndex;
    }
    return -1;
}

public void startCell(Cell cell) {
    if (rowIndex % 50 == 0 && (colIndex == 0 || colIndex == 3)) {
        // do not change the content of hyperlink.
        return;
    }
    if (colIndex < 10) {
        cell.putValue("test_" + rowIndex + "_" + colIndex);
        cell.setStyle(style1);
    } else {
        if (colIndex == 19) {
            cell.setFormula("=Rand() + test!L1");
        } else {
            cell.putValue(rowIndex * colIndex);
        }
        cell.setStyle(style2);
    }
}

public void startRow(Row row) {
    row.setHeight(25);
}

public boolean startSheet(int sheetIndex) {
    if (sheetIndex < sheetCount) {
        // reset row/column index
        rowIndex = -1;
        colIndex = -1;
        return true;
    }
    return false;
}`

Source - https://forum.aspose.com/t/read-write-huge-excel-files-1m-rows/38441/8 https://docs.aspose.com/cells/java/using-lightcells-api/

1 Answers1

1

You need to understand the object model, internal structure and working of the light cells APIs first. The LightCells APIs (provided by Aspose.Cells) are mainly designed to manipulate cell data one by one without building a complete data model block (using the Cell collection, etc.) into memory. It works in an event-driven mode. To save workbooks, it will provide the cell content cell by cell when saving, and the component saves it to the output file directly. Similarly when reading template files, the component parses every cell and provides their value one by one.

In short, when using light cells APIs, one Cell object is processed and then discarded, the Workbook object does not hold the collection at all in the memory. This model would surely save lots of memory when handling large Excel file with data.

Please consider and note: For your needs, you may choose XLSX file format instead of XLS, because XLSX is an Open Office XML-based spreadsheet format. Generally, using LightCells API to save XLSX file may save 50% or more memory than using the common way, saving XLS may save about 20-40% memory.

I specifically want to load the data of the cells in memory in chunks (for e.g. processing only the first 100 rows and then the next 100 and so on) so that less memory will be consumed in the entire process.

As I mentioned above, you do not need to do this as in LightCells APIs, data is processed cell by cell (without building the complete data model for the whole spreadsheet) which is more performance oriented approach and will consume less memory and works efficiently. In light weight mode, one cell is processed and then discarded immediately and then it navigates to next cell and so on.

Want to implement below code but using lightCell api in aspose- public static void createTemplate( String originalFilePath, List originalFileSheetNameList, String templateFilePath ) throws Exception {
LoadOptions opts = new LoadOptions(); opts.setMemorySetting(MemorySetting.MEMORY_PREFERENCE); Workbook wb = new Workbook(originalFilePath, opts);

In LightCells mode, importing additional data, deleting rows, columns or range of cells may not work (considering the architecture and nature of the LightCells APIs), so you have to do such things outside of the implementation, i.e., perform the operations like removing rows/cols or cells in the class that implements the light cells interface.

Should you have further queries, doubts or issues working with light cells APIs, you may also post in the dedicated forums.

PS. I am working as Support developer/ Evangelist at Aspose.

Amjad Sahi
  • 1,813
  • 1
  • 10
  • 15
  • `so you have to do such things outside of the implementation, i.e., perform the operations like removing rows/cols or cells in the class that implements the light cells interface.` as you said so do I need to do this in write api i.e LightCellDataProvider Or at the time when I read the file using Read api i.e LightCellDataHandler. – Vaibhav Karale Apr 05 '22 at 02:34
  • And also when I do cell.calculate() in processCell() and return true in LightCellDataHandler it takes less memory but more time to process the file having 100k rows Or more. – Vaibhav Karale Apr 05 '22 at 02:40
  • I have also created a new topic on aspose forums. Can you please have a look and help me with this https://forum.aspose.com/t/read-data-from-one-file-and-write-to-another-file-using-light-cell-api-for-larger-files/244398 – Vaibhav Karale Apr 05 '22 at 03:49
  • 1
    Regarding: "as you said so do I need to do this in write api i.e LightCellDataProvider Or at the time when I read the file using Read api i.e LightCellDataHand" I mean you have to do such operations outside and not in the classes which implement LightCellDataProvider interface or LightCellDataHandler interface. – Amjad Sahi Apr 05 '22 at 10:54
  • I am processing files with multiple sheets which always has one sheet containing formulaes with minimum 100k rows. When I load the file using LightCellDataHandler and when it loads the sheet containing formulaes takes more memory than other sheets. So how I load the formula sheet fast and with less memory..?? Also I want to refresh the formulaes after loading the sheet. So which will be faster and efficient in case of memory - `workbook.calculateFormula()` or `sheet.calculateFormula()` or `cell.calculate()` by looping through each cell? – Vaibhav Karale Apr 05 '22 at 13:42
  • As you are processing lots of formulas in the spreadsheet, so you should not use LightCells APIs for your task. Also, since you are using/manicuring lots of formulas, so either Workbook.calculateFormula() or Worksheet.calculateFormula() would work better for your case. – Amjad Sahi Apr 06 '22 at 08:46
  • yeah got it. Will try out. Thank you – Vaibhav Karale Apr 07 '22 at 14:20