3

I would like to extract all text elements which appear directly as a child node to the root node. I've had a glance at java standard sax fascilities using DefaultHandler; but it doesn't seem like it's path aware.

The problem is getting first-level only nodes, not extracting only text-nodes.

Is there any non-DOM oriented approach to do this? (Note, the node names are not known in advance)

[EDIT]

Sample input

<root>
   <a>text1</a>
   <b>text2</b>
   <c>text3</c>
   <nested>
       <d>not_text4</d>
       ...
   <nested>
   ...
</root>

Sample output

Map<String, String> map := {
    {a, text1}
    {b, text2}
    {c, text3}
}

Currently solved as a DOM oriented workaround. Although there exist libraries which offers a subset of xpath expressions for SAX / STAX.

Community
  • 1
  • 1
Johan Sjöberg
  • 47,929
  • 21
  • 130
  • 148

2 Answers2

2

SAX and StAX indeed aren't path aware by nature as they're event oriented. While it's certainly possible to implement a handler that tracks parsing level, you're probably better off with XPath.

A somewhat more complex tactic might be to write an XSLT transform that retains only the elements you're after and then process the result using SAX or Stax.

Don Roby
  • 40,677
  • 6
  • 91
  • 113
  • I'm afraid you are right. Good news is there might be a way for [streamig xpath](http://stackoverflow.com/questions/996103/streaming-xpath-evaluation) – Johan Sjöberg Mar 23 '11 at 11:32
  • @Johan - yes, that might work. Also see my update for another possibility that wouldn't require loading the whole doc. – Don Roby Mar 23 '11 at 11:39
1

This will be a little overhead, but you get a powerful tool to work with xml. Try JAXB.

Vladimir Ivanov
  • 42,730
  • 18
  • 77
  • 103
  • Thanks, although I need to efficiently extract a `List` of all *first-level* elements in an xml document rather than converting them to java objects. – Johan Sjöberg Mar 23 '11 at 10:24