2

I'm parsing an Excel spreadsheet with a date in it. The results from POI are off by 1 second compared to what's displayed in Excel.

The unformatted data in Excel is: 43261.5027743056 The cell in Excel has a format of: mm/dd/yyyy hh:mm:ss The field in Excel displays as: 6/10/2018 12:04:00 PM

The POI parser (v 4.0.1 and 4.1.0 both) parse it as:

  • Value: 43261.502774305598
  • Format: mm/dd/yyyy\ hh:mm:ss
  • Result: 6/10/2018 12:03:59 PM

Here's my code:

private final DataFormatter formatter;

case NUMBER:
    String n = value.toString();
    if (this.formatString != null) {
      thisStr = formatter.formatRawCellContents(Double.parseDouble(n), this.formatIndex, this.formatString);
    } 
    else thisStr = n;
    break;

Am I doing something wrong?

JJJ
  • 32,902
  • 20
  • 89
  • 102
MSquared
  • 322
  • 3
  • 15

2 Answers2

2

The problem is not the binary floating point problem. This also exists but it should not impact seconds of time.

The problem is that your value 43261.5027743056 is not really exact the date time 06/10/2018 12:04:00 but 06/10/2018 12:03:59.700. So it is 06/10/2018 12:03:59 plus 700 milliseconds. You could see this if you would formatting the cell using the format DD/MM/YYYY hh:mm:ss.000 in Excel.

For such values there is a discrepancy between Excel's date formatting and apache poi's DataFormatter, which uses Java's date format. When Excel shows the date time value 06/10/2018 12:03:59,700 without milliseconds, then it rounds to seconds internally. So 06/10/2018 12:03:59.700 is shown as 06/10/2018 12:04:00. Java's date formatters don't round but simply don't show the milliseconds. So 06/10/2018 12:03:59.700 is shown as 06/10/2018 12:03:59.

Apache poi's DateUtil provides methods which rounds seconds. But those methods seems not be used in DataFormatter.

As workaround we could override formatCellValue of DataFormatter to do so.

Complete example:

Excel:

enter image description here

Code:

import java.io.FileInputStream;

import org.apache.poi.util.LocaleUtil;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.formula.ConditionalFormattingEvaluator;

import java.util.Date;

class ExcelParseCellValues {

 public static void main(String[] args) throws Exception {

  Workbook workbook  = WorkbookFactory.create(new FileInputStream("Excel.xlsx"));

  DataFormatter dataFormatter = new DataFormatter() {
   @Override
   public String formatCellValue(Cell cell, FormulaEvaluator evaluator, ConditionalFormattingEvaluator cfEvaluator) {
    CellType cellType = cell.getCellType();
    if (cellType == CellType.FORMULA) {
     if (evaluator == null) {
      return cell.getCellFormula();
     }
     cellType = evaluator.evaluateFormulaCell(cell);
    }
    if (cellType == CellType.NUMERIC && DateUtil.isCellDateFormatted(cell, cfEvaluator)) { //we have a date
     CellStyle style = cell.getCellStyle();
     String dataFormatString = style.getDataFormatString();
     if (!dataFormatString.matches(".*(s\\.0{1,3}).*")) { //the format string does not show milliseconds
      boolean use1904Windowing = false;
      if ( cell != null && cell.getSheet().getWorkbook() instanceof Date1904Support)
       use1904Windowing = ((Date1904Support)cell.getSheet().getWorkbook()).isDate1904();
      boolean roundSeconds = true; //we round seconds
      Date date = DateUtil.getJavaDate(cell.getNumericCellValue(), use1904Windowing, LocaleUtil.getUserTimeZone(), roundSeconds);
      double value = DateUtil.getExcelDate(date);
      return super.formatRawCellContents(value, style.getDataFormat(), dataFormatString, use1904Windowing);
     }
    }
    return super.formatCellValue(cell, evaluator, cfEvaluator);
   }
  };

  CreationHelper creationHelper = workbook.getCreationHelper();

  FormulaEvaluator formulaEvaluator = creationHelper.createFormulaEvaluator();

  Sheet sheet = workbook.getSheetAt(0);

  for (Row row : sheet) {
   for (Cell cell : row) {
    String cellValue = dataFormatter.formatCellValue(cell, formulaEvaluator);
    System.out.print(cellValue + "\t");
   }
   System.out.println();
  }

  workbook.close();

 }
}

Result:

Description of value  Floatingpoint value  DD/MM/YYYY hh:mm:ss.000    DD/MM/YYYY hh:mm:ss   
Your example value    43261,5027743056     06/10/2018 12:03:59.700    06/10/2018 12:04:00   
Exact Datetime 12:04  43261,5027777778     06/10/2018 12:04:00.000    06/10/2018 12:04:00   
Exact minus 500 ms    43261,5027719907     06/10/2018 12:03:59.500    06/10/2018 12:04:00   
Exact plus 500 ms     43261,5027835648     06/10/2018 12:04:00.500    06/10/2018 12:04:01   
Exact minus 501 ms    43261,5027719792     06/10/2018 12:03:59.499    06/10/2018 12:03:59   
Exact plus 501 ms     43261,5027835764     06/10/2018 12:04:00.501    06/10/2018 12:04:01   
Axel Richter
  • 56,077
  • 6
  • 60
  • 87
0

You're doing this when you parse the cell value as a double. Not all decimal values can be represented exactly as doubles. The nearest double to 43261.5027743056 is 43261.502774305597995407879352569580078125, which rounds to the value you're seeing.

StackOverthrow
  • 1,158
  • 11
  • 23
  • 1
    Is there a better way to parse the data than into a Double then? – MSquared Jul 01 '19 at 22:48
  • 1
    How about https://poi.apache.org/apidocs/dev/org/apache/poi/ss/usermodel/DataFormatter.html#formatCellValue-org.apache.poi.ss.usermodel.Cell- ? – PJ Fanning Jul 02 '19 at 17:16
  • @MSquared It's not a matter of how you parse. The value you're seeing in Excel fundamentally cannot be represented as a double. Perhaps you should use a decimal instead. – StackOverthrow Jul 02 '19 at 19:58