0

quick question I've been asked to create a couple of parsers for XLSX file formats. Pretty much everywhere I've read says to grab the POI libraries, however the system I am working on are very touchy about bringing on external APIs so I'd far rather have to do some extra leg work myself then go down that route.

So is it possible (without spending days of coding) via a SAXParser to Parse an XLSX file or am I a mug if I dont use the POI libraries?

Cheers

* UPDATE *

Since extracting the XLSX fileand having a better look at the archive, I believe I can now parse these files without spending days coding, I could probably extract the information within a few hours. I am however only looking to extract the physical cell data and not any reference data on those values i.e. cell reference. I am also looking to extract the XLSX metadata. I'll provide a quick answer on how I did this when I am done for future reference.

Ally
  • 1,476
  • 3
  • 19
  • 30
  • Have you thought about looking at the source code of Apache POI to get a sense of how much work is involved? Especially the event based parsing that POI offers – Gagravarr Apr 27 '12 at 10:57

3 Answers3

0

a standard xlsx file is not xml so nope its not possible.

correction: Walter Laan is correct, xlsx format is indeed a zip file full of xml's and shoud be relativly easily parseable

Peter
  • 5,728
  • 20
  • 23
  • 2
    It's a bunch of XML files in a zip file, so possible but probably not without spending days of coding. – Walter Laan Apr 27 '12 at 10:05
  • Walter is correct, XLSX is an archive container for multiple XML files relating to style, theme, etc... They are highly structured as well so parsing should be relatively straight forward! – Ally Apr 27 '12 at 10:42
  • indeed it is i did not know that – Peter Apr 27 '12 at 13:52
0

Without spending few days of coding...it's not possible...you have to write code for at least two three days....it's just a zip file but bunch of XML files and manifest xml

Kamal
  • 1,122
  • 11
  • 18
0

Effectively I did this, but obviously tailored my java to read the specific xlsx XML structure.

To open the xlsx in java use the ZipEntry API's & enumerate that entry to ensure you drill down through all the various folder structures. Then follow the guide below to read the XML:

http://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/

Cheers

Ally
  • 1,476
  • 3
  • 19
  • 30