9

I'm learning about data driven testing using Selenium and Excel. I'm taking an online course that has asked used to add the Apache poi and poi-ooxml dependencies in Maven.

I'm struggling to understand what the differences between the two are. Are both required in order to retrieve data in Excel and pass these to our tests?

Thanks

  • Does [the Apache POI website explanation](http://poi.apache.org/components/index.html#components) not cover you? – Gagravarr Feb 13 '20 at 23:29
  • I have already looked at that. It's still not clear what the difference is between the two JARs, hence my question. –  Feb 15 '20 at 17:03
  • Which file format do you want to work with? It lists all of the formats and which jars you need for which.... – Gagravarr Feb 15 '20 at 21:13
  • So the reason why the instructor has imported the two jars is due to working with different Excel formats? This is why I am confused. The instructor only uses .xlsx files, so my confusion stems from whether there was a need or a dependency to have both jars for any other reason. –  Feb 18 '20 at 22:59

1 Answers1

21

Excel files has long history

  • Excel 97-2003 workbook:

    This is a legacy Excel file that follows a binary file format. The file extension of the format is .xls. Excel 97-2003 in terms of apache poi is called - Horrible Spreadsheet Format As the Excel file format is complex and contains a number of tricky characteristics, apache-poi jar has code to handle these file

enter image description here

  • Excel 2007+ workbook:

    This is the default XML-based file format for Excel 2007 and later versions. It follows the Office Open XML (OOXML) format, which is a zipped, XML-based file format developed by Microsoft for representing office documents. The file extension of the format is .xlsx. ( DOCX,PPTX are other OOXML based examples). Excel 2007+ workbook in terms of apache poi is called - XML Spreadsheet Format -these file format are advanced version of HSSF and has additional features, code to handle these files are written in apache-poi-ooxml jar

enter image description here

  • More reading

As .xls is almost dead but still some applications use it, so for backward compatibility both dependencies are required. here is what Apache have to say -

  • HSSF Excel XLS poi For HSSF only, if common SS is needed see below
  • Common SS Excel XLS and XLSX poi-ooxml WorkbookFactory and friends all require poi-ooxml, not just core poi

you can read more at their official website http://poi.apache.org/components/index.html#components

vh1ne
  • 287
  • 2
  • 10
  • 1
    Thank you vh1ne. That was very clearly explained and made total sense. –  Apr 26 '20 at 19:15
  • As of POI 4.1.2, `poi-ooxml` has a dependency on `poi`. So, is it still necessary to explicitly depend of both of them for backward compatibility? – Sayak Mukhopadhyay Nov 01 '20 at 12:56
  • @SayakMukhopadhyay - It's not because of backward compatibility, it's because of common base classes/interfaces and other shared classes, which reside in the poi artifact. – kiwiwings Nov 02 '20 at 12:22
  • @kiwiwings So I believe that depending on `poi` is not really needed if I am depending on `poi-ooxml` since `poi-ooxml` is already depending on `poi`. I guess if I am using an API which only exists in `poi` I should depend on it as a best practice. – Sayak Mukhopadhyay Nov 02 '20 at 13:53
  • 3
    @SayakMukhopadhyay ok, I've got your comment wrong - you don't need to explicitly import the poi (main) artifact, as you pointed out, the maven dependency takes care of that. But when not using maven as a dependency mechanism, you need to have both on the class-/modulepath. – kiwiwings Nov 02 '20 at 14:04