17

I am creating a java program to read an excel sheet and create a comma separated file. When I run my sample excel file, with blank columns, The first row works perfectly, but the rest of the rows skip the blank cells. I have read about the code changes required to insert blank cells into the rows, but my question is why does the first row work ????


public ArrayList OpenAndReadExcel(){
    FileInputStream file = null;
    HSSFWorkbook workBook = null;
    ArrayList <String> rows = new ArrayList();

    //open the file

    try {
         file = new FileInputStream(new File("Fruity.xls"));
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        System.out.println("Could not open Input File");
        e.printStackTrace();
    }

    //  open the input stream as a workbook

        try {
             workBook = new HSSFWorkbook(file);
        } catch (IOException e) {
            // TODO Auto-generated catch block
            System.out.println("Can't Open HSSF workbook");
            e.printStackTrace();
        }

        // get the sheet
        HSSFSheet sheet = workBook.getSheetAt(0);

        // add an iterator for every row and column
        Iterator<Row> rowIter = sheet.rowIterator();

        while (rowIter.hasNext())
        {

            String rowHolder = "";
            HSSFRow row = (HSSFRow) rowIter.next();
            Iterator<Cell> cellIter = row.cellIterator();
            Boolean first =true;
            while ( cellIter.hasNext())
            {
                if (!first)
                    rowHolder = rowHolder + ",";

                HSSFCell cell = (HSSFCell) cellIter.next();

                rowHolder = rowHolder + cell.toString() ;
                first = false;
            }

            rows.add(rowHolder);

        }

    return rows;

}
public void WriteOutput(ArrayList<String> rows) {

    // TODO Auto-generated method stub
    PrintStream outFile ;
    try {


        outFile = new PrintStream("fruity.txt");
        for(String row : rows)
        {   
            outFile.println(row);
        }
        outFile.close();

    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

}
-----
my Input in .xls file (Sorry don't know how to insert an excel table here )

Name >>>>>>>>>> Country of Origin >>>>>>>>> State of origin >>>>>>> Grade>>>>>> No of months
Apple >>>>>>>> USA >>>>>>>>>>>>>>>>>>>>>> Washington >>>>>>>>>>>>>> A >>>>>>>>> 6
orange >>>>>> USA >>>>>>>>>>>>>>>>>>>>>> Florida >>>>>>>>>>>>>>>>> A >>>>>>>>> 9
pineapple>>>>> USA >>>>>>>>>>>>>>>>>>>>>> Hawaii >>>>>>>>>>>>>>>>>> B >>>>>>>>> 10
strawberry>>>> USA >>>>>>>>>>>>>>>>>>>>>> New Jersey>>>>>>>>>>>>>> C >>>>>>>>>> 3

my output text file
Name ,Country of Origin,State of origin,,,Grade,No of months
Apple,USA,Washington,A,6.0
orange,USA,Florida,A,9.0
pineapple,USA,Hawaii,B,10.0
strawberry,USA,New Jersey,C,3.0


Notice the two extra commas before the Grade column... This is because I have two blank columns there.<br/>

These extra commas are missing in the rest of the output.

I am using Apache Poi-3.9-20121203.jar

user2047551
  • 183
  • 1
  • 1
  • 8
  • figured out this one. It is the formatting... First line is the title which was in BOLD... the BOLD was applicable even for the empty columns... So those columns were assumed to have blank value and not null... thanks – user2047551 Feb 06 '13 at 17:27
  • Consider editing the question text. – soufrk Apr 07 '17 at 10:22

2 Answers2

28

You should have a read through the Iterating Over Rows and Cells documentation on the Apache POI website.

The CellIterator will only return cells that have been defined in the file, which largely means ones with either values or formatting. The excel file format is sparse, and doesn't bother storing cells which have neither values nor formatting.

For your case, you must have formatting applied to the first row, which causes them to show up.

You need to read through the documentation and switch to lookups by index. That will also allow you full control over how blank vs never used cells are handled in your code.

Gagravarr
  • 47,320
  • 10
  • 111
  • 156
  • 16
    This is a really bad API design. Why? First, iteration over the cells by iterator differs from iterating by index - this is really surprising. Second, it does not offer me the structure I see when I open the spreadsheet. – Peter Jan 25 '17 at 09:59
  • 1
    Peter, I agree. – charlottesville Mar 15 '17 at 06:22
  • @user3153014 this was answered 5 years ago already. maybe this would help https://community.oracle.com/thread/2213406 – chip Aug 30 '18 at 12:38
  • 1
    @user3153014 Looks like some redirects were lost in the site refresh, should be back working again now – Gagravarr Sep 04 '18 at 19:58
  • From the Doc: "In some cases, when iterating, you need full control over how missing or blank rows and cells are treated, and you need to ensure you visit every cell".. No, that's pretty much ALL cases! – Black Jul 13 '21 at 03:03
  • @Black If iterating, you can ask the cells you get what their index or cell address is, and handle any gaps in your code. All depends on the approach you're taking - give me everything works well with iterator, give me column B is better with indexes and missing cell policies – Gagravarr Jul 13 '21 at 08:03
2

CellIterator does not iterate over cells which do not have any formatting or value applied.

First row must have had at least value or formatting applied.

If you want to read such cells as well you need to address it directly by specifying cell number

row.getCell(cellNumber, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);

Using Row.MissingCellPolicy.CREATE_NULL_AS_BLANK returns it as a blank cell.

michal.jakubeczy
  • 8,221
  • 1
  • 59
  • 63