0

This is an enhancement to my old question: Read empty cell using Apache POI Event model.

Actually I am trying to read empty cells and it works when empty cells occur in middle or end columns. However, if first column has empty cells, it doesn't treat it as BlankRecord.sid in below code. And due to this the value for that cell is set to empty string. I want to treat the first column too as BlankRecord so that it will be set to null.

Here is the code for xls:

public void processRecord(Record record) {
        int thisRow = -1;
        String thisStr = null;

        switch (record.getSid()) {
            case BoundSheetRecord.sid:
                boundSheetRecords.add(record);
                break;
            case BOFRecord.sid:
                BOFRecord br = (BOFRecord)record;
                if(br.getType() == BOFRecord.TYPE_WORKSHEET) {
                    // Works by ordering the BSRs by the location of their BOFRecords, and then knowing that we
                    // process BOFRecords in byte offset order
                    if(orderedBSRs == null) {
                        orderedBSRs = BoundSheetRecord.orderByBofPosition(boundSheetRecords);
                    }

                    // Check the existence of sheets
                    if(sheetIndex == 0) {
                        for(int i=0;i<excelSheetList.length;i++) {
                            boolean found = false;
                            for(int j=0;j<orderedBSRs.length;j++) {
                                if(this.getExcelSheetSpecification().equals(MSExcelAdapter.USE_WORKSHEET_NAME)) {
                                    String sheetName = ((BoundSheetRecord) boundSheetRecords.get(j)).getSheetname();
                                    if(excelSheetList[i].equals(sheetName)) {
                                        found = true;
                                        break;
                                    }
                                } else {
                                    try {
                                        if(Integer.parseInt(excelSheetList[i]) == j) {
                                            found = true;
                                            break;
                                        }
                                    } catch (NumberFormatException e) {
                                    }
                                }
                            }
                            if(!found)
                                this.warning("processRecord()","Sheet: " + excelSheetList[i] + " does not exist.");
                        }
                    }

                    readCurrentSheet = true;
                    sheetIndex++;
                    if(this.getExcelSheetSpecification().equals(MSExcelAdapter.USE_WORKSHEET_NAME)) {
                        String sheetName = ((BoundSheetRecord) boundSheetRecords.get(sheetIndex-1)).getSheetname();
                        if(!canRead(sheetName)) {
                            readCurrentSheet = false;                       
                        }
                    } else {
                        if(!canRead(sheetIndex + "")) {
                            readCurrentSheet = false;
                        }
                    }
                }
                break;

            case SSTRecord.sid:
                sstRecord = (SSTRecord) record;
                break;

            case BlankRecord.sid:
                BlankRecord brec = (BlankRecord) record;

                thisRow = brec.getRow();
                thisStr = null;
                values.add(thisStr);
                columnCount++;
                break;

            case FormulaRecord.sid:
                FormulaRecord frec = (FormulaRecord) record;

                thisRow = frec.getRow();
                if(Double.isNaN( frec.getValue() )) {
                    // Formula result is a string
                    // This is stored in the next record
                    outputNextStringRecord = true;
                    nextRow = frec.getRow();
                } else {
                    thisStr = formatListener.formatNumberDateCell(frec);
                }
                break;  

            case StringRecord.sid:
                if(outputNextStringRecord) {
                    // String for formula
                    StringRecord srec = (StringRecord)record;
                    thisStr = srec.getString();
                    thisRow = nextRow;
                    outputNextStringRecord = false;
                }
                break;

            case LabelSSTRecord.sid:
                if(readCurrentSheet) {
                    LabelSSTRecord lsrec = (LabelSSTRecord) record;
                    thisRow = lsrec.getRow() + 1;
                    if(rowNumberList.contains(thisRow + "") ||
                            (rowNumberList.contains(END_OF_ROWS) && thisRow >= secondLastRow)) {
                        if(sstRecord == null) {
                            thisStr = "(No SST Record, can't identify string)";
                        } else {
                            thisStr = sstRecord.getString(lsrec.getSSTIndex()).toString();
                        }
                    }
                }
                break;

            case NumberRecord.sid:
                if(readCurrentSheet) {
                    NumberRecord numrec = (NumberRecord) record;
                    thisRow = numrec.getRow() + 1;
                    if(rowNumberList.contains(thisRow + "") ||
                                (rowNumberList.contains(END_OF_ROWS) && thisRow >= secondLastRow)) {
                            thisStr = formatListener.formatNumberDateCell(numrec); // Format
                    }
                }
                break;
            default:
                break;
        }

        // Handle missing column
        if(record instanceof MissingCellDummyRecord) {
            thisStr = "";
        }

        // If we got something to print out, do so
        if(thisStr != null) {
            values.add(thisStr);
            columnCount++;
        }

        // Handle end of row
        if(record instanceof LastCellOfRowDummyRecord) { 
               .....
        }
        ...

With xlsx if the first column has empty cell it is being skipped. Here is the code for xlsx:

 /** 
     * Default handler for parsing an excel sheet
     * @see org.xml.sax.helpers.DefaultHandler
     */
    private class SheetHandler extends DefaultHandler {
        private SharedStringsTable sst;
        private String lastContents;
        private boolean nextIsString;
        private MSExcelReader reader;

        private int thisColumn = -1;
        private int lastColumnNumber = -1;  // The last column printed to the output stream

        private SheetHandler(SharedStringsTable sst, MSExcelReader reader) {
            this.sst = sst;
            this.reader = reader;
        }

        public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
            // c => cell
            if(name.equals("c")) {
                // Figure out if the value is an index in the SST
                String cellType = attributes.getValue("t");
                if(cellType != null && cellType.equals("s")) {
                    nextIsString = true;
                } else {
                    nextIsString = false;
                }
                // Get the cell reference
                String r = attributes.getValue("r");
                int firstDigit = -1;
                for (int c = 0; c < r.length(); ++c) {
                    if (Character.isDigit(r.charAt(c))) {
                        firstDigit = c;
                        break;
                    }
                }
                thisColumn = nameToColumn(r.substring(0, firstDigit));
            }
            // Clear contents cache
            lastContents = "";
        }

        public void endElement(String uri, String localName, String name) throws SAXException {
            // Process the last contents as required.
            // Do now, as characters() may be called more than once
            if(nextIsString) {
                try {
                    int idx = Integer.parseInt(lastContents);
                    lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
                } catch (NumberFormatException e) {
                }
            }

            // v => contents of a cell
            // Output after we've seen the string contents
            if(name.equals("v")) {
                for (int i = lastColumnNumber; i < thisColumn - 1; ++i)
                    values.add(null);  // Add empty string for missing columns

                values.add(lastContents);

                // Update column
                if (thisColumn > -1)
                    lastColumnNumber = thisColumn;
            }

            if(name.equals("row")) {
            ...

Same as my old question things to mention: I am not using the usermodel (org.apache.poi.ss.usermodel) but an Event API to process xls and xlsx files.

I am implementing HSSFListener and overriding its processRecord(Record record) method for xls files. For xlsx files I am using javax.xml.parsers.SAXParser and org.xml.sax.XMLReader.

I am using JDK7 with Apache POI 3.7. Can someone please help?

I have excel file with following columns which looks like this :-

Column1 Column2 Column3 Column4 Column5 Column6 Column7
        Parag   Joshi   Pune                100     
        Parag   Joshi   Pune    200         

The output generated by my code when I print all the values in excel is :-

;Parag;Joshi;Pune;null;100;null
;Parag;Joshi;Pune;200;null;null

See above that it printed empty string for first column but for other columnd it printed the value null. I want first column to print the same value null.

Community
  • 1
  • 1
ParagJ
  • 1,566
  • 10
  • 38
  • 56
  • For the XLS (HSSF) case, can you clarify what isn't working for you with MissingCellAwareRecord support? – Gagravarr Aug 08 '13 at 15:00
  • @Gagravarr: Thanks. Please see what is not working above. I have edited the question in the end. – ParagJ Aug 12 '13 at 07:04
  • @Gagravarr: Can you please help? – ParagJ Aug 14 '13 at 09:47
  • Take a look at [XLS2CSVmra](https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/hssf/eventusermodel/examples/XLS2CSVmra.java) which is a POI example. That correctly manages to use the EventModel to print out a CSV, including missing cells. Compare your code to that, and use that to work out what you have done wrong! – Gagravarr Aug 14 '13 at 11:06
  • @Gagravarr it doesn't handle empty cells properly, it needs the minimum number of columns to work, which isn't a good way. – Jonathan Drapeau Sep 25 '14 at 13:32
  • @JonathanDrapeau The file format is a sparse one, so the user needs to say how many columns they care about, as that info isn't stored helpfully in the file at run time – Gagravarr Sep 25 '14 at 14:53
  • @Gagravarr And it's a shame it isn't there. You either have to know in advance or read the whole thing to figure it out. – Jonathan Drapeau Sep 25 '14 at 15:05

1 Answers1

0

If you have a cleaner way than this, please advice, thanks!

Create an ArrayList to store columnIndex;

    ArrayList<Integer> listAllColInRow = new ArrayList<>(); 

Create an ArrayList to store your data;

    addDataToRow = new ArrayList<>();    

Add each columnIndex to listAllColInRow;

    while (cells.hasNext()) {   
        cell = (HSSFCell) cells.next();    
        int col = cell.getColumnIndex();    
        listAllColInRow.add(col);    
    }   

Get the first Column index in each row so we can check if it is the the first column;

    Integer a = listAllColInRow.get(0);    

If its not the first column, add null to the first index of your arrayList which is addDataToRow in this example and so on until it reach the first column;

    // while a is not the 1st column    
    while( a != 0){    
        //add null to the 1st index of the ArrayList    
        addDataToRow.add(0,null);    
        a--;    
    }   
IIo0oII
  • 1
  • 3