15

I need to create a method that can read both xls and xlsx files. According to my research, HSSF is used to read xls and XSSF to read xlsx. Is there a part of the Apache POI I can use to read both files? I also came across the ss.usermodel but found no sufficient codes that will entertain both xls and xlsx....

TOMAS DEL CASTILLO
  • 167
  • 1
  • 1
  • 5
  • WorkbookFactory.create() must be the accepted answer. http://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/WorkbookFactory.html – Kishan B May 15 '16 at 19:57

7 Answers7

20

Yes, there's a new set of interfaces provided by POI that work with both types.

Use the WorkbookFactory.create() method to get a Workbook: http://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/WorkbookFactory.html

You can check for excel files without relying on file extensions (which are unreliable - many csv files have xls extensions for example but cannot be parsed by POI) using the following:

//simple way to check for both types of excel files
public boolean isExcel(InputStream i) throws IOException{
    return (POIFSFileSystem.hasPOIFSHeader(i) || POIXMLDocument.hasOOXMLHeader(i));
}
tom
  • 2,704
  • 16
  • 28
18

I haven't had much exp with Apache POI, but as far as i know if you refer to a workbook by class "Workbook" then you can read and write both xls & xlsx.

All you have to do is when creating object write

for .xls-

Workbook wb = new HSSFWorkbook();

for .xlsx-

Workbook wb = new XSSFWorkbook();

you can pass a parameter for file type and create the WorkBook object accordingly using If statement.

Sumit Gupta
  • 437
  • 4
  • 12
  • Hello! I also have the said codes in mind and I was planning to use those if I cannot find any other way to detect the file extension. – TOMAS DEL CASTILLO Oct 03 '13 at 02:54
  • I recommend you to check this basic utility to read all excel files: https://github.com/eaorak/excelr – koders Mar 04 '15 at 18:59
13

you can read using poi-ooxml and poi-ooxml-schema jars provided by apache.

and use below code:--

Workbook wb = null;
excelFileToRead = new FileInputStream(fileName);
wb = WorkbookFactory.create(excelFileToRead); 
Sheet sheet = wb.getSheet(sheetName);

the above code will read both xls and xlsx files

Prashant Gautam
  • 589
  • 8
  • 10
3

Thanks to Tom's answer just to add, use foll. code to get inputstream else we may face Exception in thread "main" java.io.IOException: mark/reset not supported

     InputStream inputStream = new FileInputStream(new File("C:\\myFile.xls"));

     if(! inputStream.markSupported()) {
                inputStream = new PushbackInputStream(fileStream, 8);
     }   
Amit
  • 465
  • 2
  • 6
  • 20
3

You can use

Workbook wb = WorkBookFactory().create(inputStream); 
tanle
  • 61
  • 1
  • 6
  • Using a `File` is generally better, see http://poi.apache.org/spreadsheet/quick-guide.html#FileInputStream – Gagravarr Jun 16 '17 at 12:17
1

one option would be to check the file name with lastIndexOf for . and see if it is .xls or xlsx and then use an if condition to switch accordingly. been a long time since i worked on poi but i think it the attributes are like HSSF for .xls and XSSF for .xlsx refer http://poi.apache.org/ site, last line under the topic Why should I use Apache POI?

Balaji Krishnan
  • 1,007
  • 3
  • 12
  • 29
0

It appears you are looking for a way to abstract the read process, you are saying it doesn't matter if its XLS or XLSX, you want your code to work without modification.

I'd recommend you to look at Apache Tika, its an awesome library that abstracts file reading and content parsing, it uses POI and many other libraries and has a nice abstraction to all of them.

reading a PDF/XLS/XLSX is similar to reading a text file, all the work is done behind the scene.

read this for more. http://www.searchworkings.org/blog/-/blogs/introduction-to-apache-tika

Anantha Sharma
  • 9,920
  • 4
  • 33
  • 35