0

I'm developing an app in Android and I need to traverse a xml file.

I need to traverse a xml - backwards and forward from a given position. It means i start to parse the file, but at each instant i can stop and go backwards or continue.

I was thinking in using DOM, with its for cycle i could control it and do what i wanted. But the xml file that i want to parse has at least 8 Mb and since DOM is very memory intensive, don't seem to be a good solution.

A solution to this problem was not to load the whole document for parsing. Like to split the document in several parts and only load one part to memory and parse. When i come to the end of this part, i load another. The same when i want to rewind.

My question is, how can i achieve to split the file in several pieces. Since it is a xml file and the childs don't have all the same size ?

For example:

<root>
   <child time="A">
     <sub1>1</sub1>
     <sub2>2</sub2>
   </child>

   <child time="B">
     <sub1>3</sub1>
   </child>

   <child time="C">
     <sub2>4</sub2>
   </child>
</root>

As you can see, their childs have different sizes and i don't know how I can split a file like this in an efficient way in several parts.

Can anyone give me a clue ?

Best regards.

João Nunes
  • 711
  • 4
  • 11
  • 22

1 Answers1

2

With XML you typically have to make a choice. DOM is memory intensive, SAX cannot go backward, and hand made parsers are tedious to create and maintain.

If you can afford consuming tens of MB of memory, go simply with DOM.

The decision between SAX and manual parsing depends on how often you actually need to go backward and whether you can afford a delay at this point.

If you cannot, you will have to implement a hand made parser with precomputation. Precomputation can be done, for example, using SAX, used in conjunction with CountingInputStream, or also manually. You would precompute starting and ending offsets of each n-th child element and store that as an array of intervals like these:

public class Interval {
    public int startOffset;
    public int endOffset;
}

Interval[] precomputedOffsets;

The value of n, the page size, could be something like 20. Balance that to control the tradeoff between memory consumption and performance of going back.

Now, if you know that you need to go to item i at runtime, you will call reset and skip(precomputedOffsets[i / n]) on the input stream, and hand parse of i % n remaining child elements from there.

Jirka Hanika
  • 13,301
  • 3
  • 46
  • 75