I am evaluating vtd-xml as a possible solution for a large data migration project. The input data is in xml format and if vtd-xml is viable it would save a lot of dev time. I run the example Process Huge XML Documents (Bigger than 2GB) from vtd-xml website: http://vtd-xml.sourceforge.net/codeSample/cs12.html.
I successfully process 500Mb but get the dreaded java.lang.OutOfMemoryError: Java heap space error with a 4Gb file.
- JVM Arguments: -Xmn100M -Xms500M -Xmx2048M.
- JVM Arguments: -Xmn100M -Xms500M -Xmx4096M.
And with Maven:
- set MAVEN_OPTS=-Xmn100M -Xms500M -Xmx2048M
- set MAVEN_OPTS=-Xmn100M -Xms500M -Xmx4096M
NOTE: I have tested it with various combinations of the JVM arguments.
I have studied the vtd-xml site and API docs and browsed numerous questions here and elsewhere. All the awnsers point to setting the JVM memory higher or adding more physical memory. The vtd-xml website refer to memory usage of 1.3x-1.5x the xml file size but if using 64bit one should be able to process much larger files than available memerory. Surely it would also not be feasible to add 64Gb memory to process a 35Gb xml file.
Environment:
Windows 7 64 bit. 6Gb RAM. (Closed all other apps, 85% memory avaibale)
java version "1.7.0_09"
Java(TM) SE Runtime Environment (build 1.7.0_09-b05)
Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode)
Eclipse Indigo
Maven 2
Running the example from both Eclipse and Maven throws the Out of memory exception.
Example code:
import com.ximpleware.extended.VTDGenHuge;
import com.ximpleware.extended.VTDNavHuge;
import com.ximpleware.extended.XMLMemMappedBuffer;
public class App {
/* first read is the longer version of loading the XML file */
public static void first_read() throws Exception{
XMLMemMappedBuffer xb = new XMLMemMappedBuffer();
VTDGenHuge vg = new VTDGenHuge();
xb.readFile("C:\\Temp\\partial_dbdump.xml");
vg.setDoc(xb);
vg.parse(true);
VTDNavHuge vn = vg.getNav();
System.out.println("text data ===>" + vn.toString(vn.getText()));
}
/* second read is the shorter version of loading the XML file */
public static void second_read() throws Exception{
VTDGenHuge vg = new VTDGenHuge();
if (vg.parseFile("C:\\Temp\\partial_dbdump.xml",true,VTDGenHuge.MEM_MAPPED)){
VTDNavHuge vn = vg.getNav();
System.out.println("text data ===>" + vn.toString(vn.getText()));
}
}
public static void main(String[] s) throws Exception{
first_read();
//second_read();
}
}
Error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.ximpleware.extended.FastLongBuffer.append(FastLongBuffer.java:209)
at com.ximpleware.extended.VTDGenHuge.writeVTD(VTDGenHuge.java:3389)
at com.ximpleware.extended.VTDGenHuge.parse(VTDGenHuge.java:1653)
at com.epiuse.dbload.App.first_read(App.java:14)
at com.epiuse.dbload.App.main(App.java:29)
Any help would be appreciated.