0

I am trying to accomplish the following:

  1. load a document (done)
  2. go trough the document depth first and use a DefaultHandler from JDK to do some work

The reason I want to do this is that I already have my handler, and now I am using it with a SAX parser. I now want to use the handler on the in-memory document.

Note that this is useful in the following way: I have to use the handler multiple times. For large documents I want to use SAX, for small I want to use the internal representation.

Thanks!

Don Roby
  • 40,677
  • 6
  • 91
  • 113
Jonny5
  • 1,390
  • 1
  • 15
  • 41

1 Answers1

1

The quickest way (quick in coding) to accomplish this is to write the portion of the internal document that you wish to parse with SAX into an internal string, and then using a StringReader based on that string, pass that to a SAX parser using your handler.

What you really need is to generate SAX events based on your data and feed those events to the handler. You can do that by getting the data into the form of an InputSource or Reader and then using that in your parse, which is the tactic described above, or you can simply simulate the SAX events by directly calling the methods of the ContentHandler you've already written. But calling them in the right order and feeding them the right data to accomplish what you need may be painful if your document is at all complex.

If Dom4J provides a way to create an InputSource based on a node in your document structure, that will be the easiest to use, and likely much more efficient than writing it to a string first.

You might better consider extracting the portions of your ContentHandler that do the actual work into a separate class that you can use both from the ContentHandler and from a new class that walks the internal tree.

Don Roby
  • 40,677
  • 6
  • 91
  • 113
  • But this would be very slow, no? – Jonny5 Jun 12 '12 at 10:34
  • Might be slow, depending on how much you're reparsing. But given that you've already parsed, you're being a bit inefficient by the decision to reparse portions. – Don Roby Jun 12 '12 at 10:56
  • Thats true. I could actually add several handlers and do more stuff in parallel. I think I'm going to extract the part of the ContentHandler doing the work, as you suggest. Thanks! – Jonny5 Jun 12 '12 at 13:01