Suggestion to parse this XML in Java

Question

Not new to Java; but relatively new to XML-parsing. I know a tiny bit about a lot of the XML tools out there, but not much about any of them. I am also not an XML-pro.

My particular problem is this... I have been given an XML-document which I cannot modify and from which I need only to parse random bits of it into Java objects. Sheer speed is not much of a factor so long as it's reasonable. Likewise, memory-footprint need not be absolutely optimal either, just not insane. I only need to read through the document one time to parse it, after that I'll be throwing it in the bitbucket and just using my POJO.

So, I'm open to suggestion... which tool would you use?
And, would you kindly suggest a bit of starter-code to address my particular need?

Here's a snippet of sample XML and the associated POJO I'm trying to craft:

<xml>
  <item id="...">
    ...
  </item>
  <metadata>
    <resources>

      <resource>
        <ittype>Service_Links</ittype>
        <links>
          <link>
            <path>http://www.stackoverflow.com</path>
            <description>Stack Overflow</description>
          </link>
          <link>
            <path>http://www.google.com</path>
            <description>Google</description>
          </link>
        </links>
      </resource>

      <resource>
        <ittype>Article_Links</ittype>
        <links>
          ...
        </links>
      </resource>

      ...

    </resources>
  </metadata>
</xml>


public class MyPojo {

    @Attribute(name="id")
    @Path("item")
    public String id;

    @ElementList(entry="link")
    @Path("metadata/resources/resource/links")
    public List<Link> links;
}

NOTE: this question was originally spawned by this question with me trying to solve it using SimpleXml; I'm to the point where I thought maybe someone could suggest a different route to solving the same problem.

Also Note: I'm really hoping for a CLEAN solution... by which I mean, using annotations and/or xpath with the least amount of code... the last thing I want is huge class file with huge unwieldy methods... THAT, I already have... I'm trying to find a better way.

:D

What's wrong with SAXParser or DocumentBuilder? Also, please accept some answers to previous questions. — Jim Garrison, Oct 08 '12 at 22:59
@JimGarrison Maybe nothing! :P Problem is, I've spent enough time monkeying around with things I don't know only to find they don't quite go the full mile that I thought I better ask someone who knows. I'm looking at SAXParser right now but if you have a link or some sample code to demo how I might go about it, that would be a boon. — Bane, Oct 08 '12 at 23:03
Note: I'm not sure the "-1" was warranted... I gave a very clear description and sample code demonstrating my problem and I referred to another post of similar clarity; both of them explained that I've been trying other technologies and can't find a clean fit. Yeah, I don't know all the tech... I said that up-front. — Bane, Oct 08 '12 at 23:05
The ones which were helpful to you. I briefly looked at some of your previous questions. Personally I think one is subjective and should be deleted. Most are old enough that if someone did not answer the question, your probably have by now. So either accept a good answer or provide your own to accept. — Tim Bender, Oct 08 '12 at 23:14
@TimBender I clicked through them just now and the ones I haven't accepted were either not answered sufficiently or I have not found a good solution yet. There is one which was just answered a few days ago but which I haven't been able to test yet. I guess I could delete the subjective one but otherwise I don't feel compelled to accept answers that might misdirect the next guy coming behind me. — Bane, Oct 08 '12 at 23:16
Does an XSD for the xml exist? If that's the case you could also give JaxB a try. — daniel, Oct 08 '12 at 23:19
@daniel no it doesn't; from what I'm getting, it's just an XML file similar to the above. — Bane, Oct 08 '12 at 23:20
@JimGarrison I've looked at SAXParser but from what I can tell that's going to generate some pretty nasty verbose code... I could be missing something new in the API though... does that sound about right or no? — Bane, Oct 08 '12 at 23:23
If you use DocumentBuilder with XPath you should be able to extract what you need without too much work. BTW, not my downvote :-) — Jim Garrison, Oct 08 '12 at 23:41

score 1 · Accepted Answer · answered Nov 05 '12 at 16:22

OK, so I settled on a solution that (to me) seemed to address my needs in the most reasonable way. My apologies to the other suggestions, but I just liked this route better because it kept most of the parsing-rules as annotations and what little procedural-code I had to write was very minimal.

I ended up going with JAXB; initially I thought JAXB would either create XML from a Java-class or parse XML into a Java-class but only with an XSD. Then I discovered that JAXB has annotations that can parse XML into a Java-class without an XSD.

The XML-file I'm working with is huge and very deep, but I only need bits and bites of it here and there; I was worried that navigating what maps to where in the future would be very difficult. So I chose to structure a tree of folders modeled after the XML... each folder maps to an element and in each folder is a POJO representing that actual element.

Problem is, sometimes there is an element who has a child-element several levels down which has a single property I care about. It would be a pain to create 4 nested-folders and a POJO for each just to get access to a single property. But that's how you do it with JAXB (at least, from what I can tell); once again I was in a corner.

Then I stumbled on EclipseLink's JAXB-implementation: Moxy. Moxy has an @XPath annotation that I could place in that parent POJO and use to navigate several levels down to get access to a single property without creating all those folders and element-POJOs. Nice.

So I created something like this: (note: I chose to use getters for cases where I need to massage the value)

// maps to the root-"xml" element in the file
@XmlRootElement( name="xml" )
@XmlAccessorType( XmlAccessType.FIELD )
public class Xml {

    // this is standard JAXB
    @XmlElement;               
    private Item item;
    public Item getItem() {    
        return this.item;
    }

    ...
}

// maps to the "<xml><item>"-element in the file
public class Item {

    // standard JAXB; maps to "<xml><item id="...">"
    @XmlAttribute              
    private String id;
    public String getId() {
        return this.id;
    }

    // getting an attribute buried deep down
    // MOXY; maps to "<xml><item><rating average="...">"
    @XmlPath( "rating/@average" )    
    private Double averageRating;
    public Double getAverageRating() {
        return this.average;
    }

    // getting a list buried deep down
    // MOXY; maps to "<xml><item><service><identification><aliases><alias.../><alias.../>"
    @XmlPath( "service/identification/aliases/alias/text()" )
    private List<String> aliases;
    public List<String> getAliases() {
        return this.aliases;
    }

    // using a getter to massage the value
    @XmlElement(name="dateforindex")
    private String dateForIndex;
    public Date getDateForIndex() {
        // logic to parse the string-value into a Date
    }

}

Also note that I took the route of separating the XML-object from the model-object I actually use in the app. Thus, I have a factory that transforms these crude objects into much more robust objects which I actually use in my app.

score 0 · Answer 2 · answered Oct 08 '12 at 23:52

If your XML documents are relatively small (as appears to be the case here), I would use the DOM framework and XPath class. Here is some boilerplate DOM/XPath code from one of my tutorials:

File xmlFile = ...
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(xmlFile);

XPath xp = XPathFactory.newInstance().newXPath();
String value = xp.evaluate("/path/to/element/text()", doc);
// .. reuse xp to get other values as required

In other words, basically you:

get your XML into a Document object, via a DocumentBuilder;
create an XPath object;
repeatedly call XPath.evaluate(), passing in the path of the element(s) required and your Document.

As you see, there's a little bit of fiddliness in getting hold of your Document object and like all good XML APIs, it throws a plethora of silly pointless checked exceptions. But apart from that, it's fairly no-nonsense for parsing simple small to medium XML documents whose structure is relatively fixed.

score 0 · Answer 3 · answered Oct 09 '12 at 08:11

0

You can use SAXParser or STAXParser. If you can afford some more amount of memory, then you can also afford to use DOMParser. I would advise STAXParser would be best for you.

answered Oct 09 '12 at 08:11

Sumit Desai

1,542
9
22

Suggestion to parse this XML in Java

3 Answers3

Linked