0

I am using following api to read data from excel as a Table: https://jtablesaw.github.io/tablesaw/gettingstarted

The code is as follows:

XlsxReader reader = new XlsxReader();
        XlsxReadOptions options = XlsxReadOptions.builder("excel/file_example_XLSX_10.xlsx").build();
        try {
            tab = reader.read(options);
            // System.out.println(tab.print());
        } catch (Exception e) {
            e.printStackTrace();
}

The file file_example_XLSX_10.xlsx is around 120 mb in size and I am getting OutOfMemoryError.

Is there a way for me to read only specific columns from the file.

Naresh Chaurasia
  • 419
  • 5
  • 21

2 Answers2

0

I don't think there's a way to only read certain columns, Have you tried using Apache POI to read the excel instead? or increase the memory when running?

  • I am trying to avoid apache poi as it will be easy to work on the data in table using Tablesaw. If nothing works, I might finally use apache poi. Yes I did try to increase memory size, but it did not help. – Naresh Chaurasia Jun 18 '20 at 10:09
0

I'm not familiar with reading Excel files, but if you can export it as one or more CSVs, here's a couple things to look at:

1) You can read files in a way that minimizes memory use. For convenience, tablesaw does not use the smallest possible numeric types. It defaults to int and double. You can specify that it try using less memory so that it will use a short or float if the given data will fit.

    Table t = Table.read()
       .csv(CsvReadOptions.builder("../myfile.csv")
          .minimizeColumnSizes()
    );

This might work for Excel also as it's defined in ReadOptions, rather than the more specific CsvReadOptions.

2) Alternately, for CSV you can specify an array of ColumnTypes, one of which can be ColumnType.SKIP. Again this can be done using CsvReadOptions.

With CSV at least, 150MB isn't too big for a typical desktop app. I read an 800MB, file yesterday without a problem and without touching the JVM memory settings in IDEA. OTOH, I'm not on the latest version so YMMV.

larry
  • 1