1

I am trying to parse xml file with VTDGen library. It was perfect to parse xml till I am having trouble with over 1GB xml File.

This is a code how I parse it.

            VTDGen vg = new VTDGen();
            in = new SmbFileInputStream(fileToGet);
            byte[] b = new byte[(int) fileToGet.length()];
            in.read(b);             
            vg.setDoc(b);
            vg.parse(true);

This is an error I get it.

com.ximpleware.ParseException: Other error: file size too big >=1GB 

Is there any way I can increase size or should I write a code with a another parser?

Thank you in advance.

Tobias Roland
  • 1,182
  • 1
  • 13
  • 35
draford
  • 243
  • 1
  • 3
  • 15

2 Answers2

1

read about the limitations of VTD:

  • Upper limits of various fields: (1) For starting tags (the max Qname length is 2048; the prefix 512), overflow conditions result in parse exceptions. For other tokens (upper limit is 1M), one can potentially break a long token into multiple shorter ones.(2) Depth field overflow condition results in parse exceptions. (3) Starting offset: Currently the biggest document supported is 1G characters (1G bytes or 2G bytes, depending on actual document encoding).

From http://vtd-xml.sourceforge.net/userGuide/0.html

Mathis Hobden
  • 356
  • 2
  • 7
  • 19
  • I guess I have to write a new parser any recommendation for parser that handles over 1GB? – draford Dec 20 '13 at 19:31
  • you need to use a SAX based parser, because SAX parse data sequentially not like DOM based parsers which put the entire content in the virtual memory see this http://stackoverflow.com/questions/3969713/java-xml-parser-for-huge-files – Mathis Hobden Dec 20 '13 at 23:56
0

There are two ways to get around the issue:

  1. Use extended VTD-XML. It is part of the vtd-xml distribution, shares a very similar API, but is a standalone product by itself.
  2. Turn off namespace awareness, that will boost the max document size from 1 GB to 2GB
vtd-xml-author
  • 3,319
  • 4
  • 22
  • 30