2

I have implemented a program where I am trying to read an excel file(.xlsx) using SAX and XSSF(POI API) with the jar version 4.1 where it works alright. But it gives compilation error in POI version 3.15.

The code looks some thing like this(which works fine in 4.10):

Default Handler Implementation (SAXExcelSheetHandler.java):

import org.apache.poi.xssf.model.SharedStringsTable;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SAXExcelSheetHandler extends DefaultHandler {
    private SharedStringsTable sst;
    private String lastContents;
    private boolean nextIsString;

    public SAXExcelSheetHandler(SharedStringsTable sst) {
        this.sst = sst;
    }

    public void startElement(String uri, String localName, String name,
                             Attributes attributes) throws SAXException {
        // c => cell
        if(name.equals("c")) {
            // Print the cell reference
            System.out.print(attributes.getValue("r") + " - ");
            // Figure out if the value is an index in the SST
            String cellType = attributes.getValue("t");
            if(cellType != null && cellType.equals("s")) {
                nextIsString = true;
            } else {
                nextIsString = false;
            }
        }
        // Clear contents cache
        lastContents = "";
    }

    public void endElement(String uri, String localName, String name)
            throws SAXException {
        // Process the last contents as required.
        // Do now, as characters() may be called more than once
        if(nextIsString) {
            int idx = Integer.parseInt(lastContents);
            lastContents = sst.getItemAt(idx).getString();
            nextIsString = false;
        }

        // v => contents of a cell
        // Output after we've seen the string contents
        if(name.equals("v")) {
            System.out.println(lastContents);
        }
    }

    public void characters(char[] ch, int start, int length) {
        lastContents += new String(ch, start, length);
    }
}

Actual Driver Class(POISaxXaafV2.java):

import java.io.File;
import java.io.InputStream;
import java.util.Iterator;
import org.apache.poi.ooxml.util.SAXHelper;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;

public class POISaxXaafV2 {

    public static void main(String[] args) throws Exception {   
        String fullFilePath = "E:\\Downloads\\6038_Open_AR.XLSX";       
        File file = new File(fullFilePath);

        OPCPackage opcPackage = OPCPackage.open(file);
        XSSFReader xssfReader = new XSSFReader(opcPackage);
        SharedStringsTable sharedStringsTable = xssfReader.getSharedStringsTable();

        XMLReader xmlParser = SAXHelper.newXMLReader();
        ContentHandler contentHandler = new SAXExcelSheetHandler(sharedStringsTable);
        xmlParser.setContentHandler(contentHandler);

        Iterator<InputStream> sheetsIterator = xssfReader.getSheetsData();
        while(sheetsIterator.hasNext()) {
            System.out.println("Processing new sheet:\n");
            InputStream sheet = sheetsIterator.next();
            InputSource sheetSource = new InputSource(sheet);
            xmlParser.parse(sheetSource);
            sheet.close();
            System.out.println("");
        }       
    }
}

So far so good and it works fine.

Now I try to implement this in my workplace where are they are using the 3.15 version of POI instead of 4.1.0. And alas here I am getting compilation errors

1st compilation error :

in the default handler implementation class : SAXExcelSheetHandler.java

at line 39:

lastContents = sst.getItemAt(idx).getString();

It says

The method getItemAt(int) is undefined for the type SharedStringsTable

2nd compilation error

This is in the driver class : POISaxXaafV2.java

at line number 4:

import org.apache.poi.ooxml.util.SAXHelper

It says :

The import org.apache.poi.ooxml cannot be resolved

And because of the above error eventually I get compilation error again at line 22:

XMLReader xmlParser = SAXHelper.newXMLReader();

because the import did not work :

SAXHelper cannot be resolved

Does this needs to written differently for version 3.15 than what I wrote in 4.10?

Please help.

DockYard
  • 989
  • 2
  • 12
  • 29
  • This POI API(Event) is too confusing and poorly documented I believe :( – Zeus Aug 16 '19 at 11:56
  • 1
    Apache POI [3.15 is about 3 years old](http://poi.apache.org/changes.html#3.15), and has a number of known security issues, why are you trying to use an old version? – Gagravarr Aug 16 '19 at 12:01
  • @Gagravarr I am trying to implement this at my workplace. They are getting these jars as maven dependency from a common module. Trying to upgrade it now might require regression from a lot of other folks. – DockYard Aug 16 '19 at 12:05
  • @Gagravarr Is implementing this(Event API SAX XSSF) not feasible in 3.15? The reason I chose this is because my existing User model of POI was throwing out of memory error for big files – DockYard Aug 16 '19 at 12:07
  • The Event API has been there since 3.5, so something is possible. Just take a look at the source code for things like `XLSX2CSV` in the examples for the 3.15 source tree to see how on that older version – Gagravarr Aug 16 '19 at 13:27

1 Answers1

3

As told you in comments already, please do not insist in outdated software versions. Doing that is not the way we do progress in software development.

But of course, the XSSF and SAX (Event API) also had exist for version 3.15.

But in this old version SharedStringsTable had not had method getItemAt. It had only getEntryAt which had return a org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst. So the code in class SAXExcelSheetHandler would must be changed like:

...
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
...

...
    public void endElement(String uri, String localName, String name)
            throws SAXException {
        // Process the last contents as required.
        // Do now, as characters() may be called more than once
        if(nextIsString) {
            int idx = Integer.parseInt(lastContents);
            //lastContents = sst.getItemAt(idx).getString();
            lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
            nextIsString = false;
        }

        // v => contents of a cell
        // Output after we've seen the string contents
        if(name.equals("v")) {
            System.out.println(lastContents);
        }
    }

...

Also there was not already a SAXHelper. So the code of class POISaxXaafV2 would must be changed like:

...
import java.io.File;
import java.io.InputStream;
import java.util.Iterator;
//import org.apache.poi.ooxml.util.SAXHelper;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;

import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;
...

...
        OPCPackage opcPackage = OPCPackage.open(file);
        XSSFReader xssfReader = new XSSFReader(opcPackage);
        SharedStringsTable sharedStringsTable = xssfReader.getSharedStringsTable();

        //XMLReader xmlParser = SAXHelper.newXMLReader();
        SAXParserFactory parserFactory = SAXParserFactory.newInstance();
        SAXParser parser = parserFactory.newSAXParser();
        XMLReader xmlParser = parser.getXMLReader();
        ContentHandler contentHandler = new SAXExcelSheetHandler(sharedStringsTable);
        xmlParser.setContentHandler(contentHandler);
...
Axel Richter
  • 56,077
  • 6
  • 60
  • 87