0

I am parsing transxchange data which has some files of very large size nearly 800 MB. when I am trying to parse these files I am getting following error.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
=======================================================================
    at java.util.ArrayList.<init>(Unknown Source)
    at java.util.ArrayList.<init>(Unknown Source)
    at JourneyPatternSections.<init>(JourneyPatternSections.java:21)
    at ReadBusData.startElement(ReadBusData.java:131)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
    at javax.xml.parsers.SAXParser.parse(Unknown Source)
    at javax.xml.parsers.SAXParser.parse(Unknown Source)
    at ReadBusData.parseDocument(ReadBusData.java:51)
    at ReadBusData.<init>(ReadBusData.java:41)
    at ReadBusData.main(ReadBusData.java:218).

I am following this Tutorial.
can Anybody help me.

Addicted
  • 1,694
  • 1
  • 16
  • 24
Ram kiran Pachigolla
  • 20,897
  • 15
  • 57
  • 78

4 Answers4

2

Q: Is It possible to parse large xml file which has size 800 MB using SAX Parser?

A: Yes, of course!

The problem isn't SAX. SAX is actually an ideal choice for handling large files.

The problem clearly occurred with your arraylist.

How big is it?

How big are other structures?

Do you actually need to store all the data you're allocating space for?

Are you running your program with any VM flags to allocate more memory?

How much memory does your PC have? Can you run it on a PC that supports more memory? A 64-bit PC?

Are you using a 64-bit JVM?

SUGGESTION: Download and try out Visual VM to troubleshoot the problem at your code level:

You'll probably find that you're allocating far more data than you intended to.

IMHO...

paulsm4
  • 114,292
  • 17
  • 138
  • 190
1

Increase your heap size, eg, launch the VM with -Xmx1g.

See this blog.

Cephalopod
  • 14,632
  • 7
  • 51
  • 70
1

SAX is going to be your best mechanism for parsing a large file. DOM parsing will load the entire document into memory and you'll run into problems. Chances are you are having issues because you are trying to store everything in a collection of some sort. SAX is great for parsing the xml, dealing with it, and moving on.

digitaljoel
  • 26,265
  • 15
  • 89
  • 115
1

The error is occurring in creating a data structure you are creating. You need to either reduce how much memory you are using or increase the amount of memory your program has.

One GB isn't that these days. If you can give it 4 to 16 GB this will make processing the file much simpler.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130