3

I am creating a web application which allows you to search some records from excel sheets.Excel sheets which need to be searched is a large one , it has 100k rows and 500 columns.

What approach should be use for this? I just want to allow users to perform search on this excel sheet,SO should I use database for it(for it first I need to set excel sheet values in database)? Or Is there any other better approach than this?

Does Apache POI provides any such direct method for searching excel files?

What will be most efficient approach for this scenario.Help with respective links will be appreciated

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
Abhijeet Panwar
  • 1,837
  • 3
  • 26
  • 48

2 Answers2

2

Since the excel file you are dealing with is big, I would suggest you to use XSSF + SAX approach. With this approach you can stream XML files that store the data inside excel files. You can capture the data from the events generated by SAX parser. Compare the search data against each row. For each search, you will have to stream through the entire XML, but it is really fast.

------------EDITED----------------------------------

In detail,

Rename the .xlsx file as .zip. It will give you a zip file. Open it and you will see a folder called "xl". Inside "xl" folder concentrate on the following things.

worksheets folder : contains the xml representation of each sheet. sharedStrings.xml : Common place for storing all the string values across sheets. styles.xml : common place for holding the styles(color, formula...) applied to cells. workbook.xml : Contains the information about all sheets in the .xlsx file

Now start the journey from workbook.xm. Inside this file all sheets are covered in tag. sheets are represented as <sheet> Eg: <sheet name="General" sheetId="15" r:id="rId1"/> Take r:id, that will help you to access the General.xml which represents the General sheet of .xlsx file.

Once you have r:id, http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api will help you to read the XML file and get its contents.

OPCPackage pkg = OPCPackage.open(filename); //opens the .xlsx file
XSSFReader r = new XSSFReader( pkg ); //reads the xml fiels indide the .xlsx file
XMLReader parser = fetchSheetParser(sst); //SAX parser for parsing the XML files.

Feed the parser with XMl read using XSSFReader, write an event handler that implements DefaultHandler interface to capture the events generated by XMLReader.

SharedStringsTable sharedStringsTable = reader.getSharedStringsTable(); // using this we can access sharedStrings.xml 
StylesTable stylesTable = reader.getStylesTable(); // using this we can access styles.xml 

Since we are streaming the XML, we dont have to worry about memory consumption.
Renjith
  • 1,122
  • 1
  • 19
  • 45
  • I treid to understand your provided link , but could't understand it.Can you explaing that what it is and how it can help me in searching? – Abhijeet Panwar Jul 17 '14 at 09:54
  • i have editted it. please have a look and let me know if you need further clarification. – Renjith Jul 17 '14 at 10:01
  • Ohk , I got it that by using it we are saving some memory , right? that even in the lesser memory we will not get 'outofmemory' exception.But how it will be in searching records faster?How can I perform Search on this xml file? Will using database for it will be a better option or should I keep searching xml documents like this? – Abhijeet Panwar Jul 17 '14 at 10:09
  • Database is always a better option if you have time and memory to read and store such a big excel file. – Renjith Jul 17 '14 at 10:16
  • Time should not be that big constraint , right? I only will have to wait for first time , right? because only first time it will enter those values to database.Do we have any direct search method in Apache POI.Or we need to iterate through all rows for searching? – Abhijeet Panwar Jul 17 '14 at 10:20
  • You can consider my suggestion only if you want to avoid database interaction. Form your question, i felt like u want to 'I just want to allow users to perform search on this excel sheet,SO should I use database for it(for it first I need to set excel sheet values in database)?' – Renjith Jul 17 '14 at 10:22
  • Ohkk .. thanks for your answer.Actually I wanted to avoid database.I thought that there might be any direct method for searching in Apache-POI .We don't have it right?So in that case using database will be a better option I think. – Abhijeet Panwar Jul 17 '14 at 10:25
  • Sorry, I am not aware of any direct search method using apache POI. – Renjith Jul 17 '14 at 10:25
0

maybe you can try to import your excel spreadsheet content to MySQL or NoSQL (e.g. MongoDB). First you want to parse your Excel into database, then create a query interface. I canot thinking of an effective way for querying spreadsheets

TonyW
  • 18,375
  • 42
  • 110
  • 183
  • I just want to allow users to perform search on this excel sheet,SO should I use database for it(for it first I need to set excel sheet values in database).So my question is that can I have a better approach than this? – Abhijeet Panwar Jul 16 '14 at 12:39