0

I am trying to parse a simple XML file. If I have a bellow XML string,

<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

I only want to extract string from <body></body>. I'm using SAXParser, and a default handler. I successfully printed out all string in tags by explicitly adding print statement in "characters" method in DefaultHandler. But I'm not sure where and what calls this character method, and how to control it.

I know how to spot a certain tag in startElement, but how do I extract string from the tag in startElement?

VGR
  • 40,506
  • 4
  • 48
  • 63
pandagrammer
  • 841
  • 2
  • 12
  • 24
  • DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder(); Document doc = db.parse(new ByteArrayInputStream(xml.getBytes())); String bodyText = doc.getElementsByTagName("body").item(0).getTextContent(); – DmitryKanunnikoff Sep 04 '14 at 17:11
  • Try to use DOM parser. It is easer in this case. – DmitryKanunnikoff Sep 04 '14 at 17:12

4 Answers4

2

According to the SAX, Default Handler documentation,

public void characters(char[] ch,
                       int start,
                       int length)
                throws SAXException

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.

So the parser may call the characters method one or multiple times for a particular text inside an element say, "Don't forget me this weekend!", until the whole text is read.

Note:

The application must not attempt to read from the array outside of the specified range.

The below code shows how to collect the text inside a single XML Element.

boolean isTagInScope = false;
StringBuilder elementContent = new StringBuilder();
public void startElement(String namespaceURI, String lName, String qName,
Attributes attributes) throws SAXException 
{
 isTagInScope = true;
}

public void endElement(String namespaceURI, String sName, String qName)
throws SAXException  throws SAXException {
 isTagInScope = false;
}

public void characters(char[] arg0, int arg1, int arg2) throws SAXException {
if(isTagInScope)
{
 String content = new String(arg0, arg1, arg2);
 elementContent.append(content);
}
}

The 'elementContent' variable will hold the entire content between start and end tags of an element.

BatScream
  • 19,260
  • 4
  • 52
  • 68
1

You can use the javax.xml.xpath APIs in Java SE to extract the text of a element.

Demo Code

import javax.xml.xpath.*;
import org.xml.sax.InputSource;

public class Demo {

    public static void main(String[] args) throws Exception {
        InputSource inputSource = new InputSource("input.xml");
        XPath xPath = XPathFactory.newInstance().newXPath();
        String text = xPath.evaluate("/note/body", inputSource);
        System.out.println(text);
    }

}

Output

Don't forget me this weekend!
bdoughan
  • 147,609
  • 23
  • 300
  • 400
0

Modified with the insight of @BatScream

The thing is to set a flag when you get the start of the tag 'body', then in the characters method if the flag is true you have it.

public class NoteHandler extends DefaultHandler {
    private static final STRING TAG_BODY = "body";
    private boolean bodyFlag = false;
    private StringBuilder body = new StringBuilder();

    public void startDocument() throws SAXException {}

    public void endDocument() throws SAXException {}

    public void startElement(String uri, String localName, tring qName, Attributes attributes) throws SAXException {
        bodyFlag = TAG_BODY.equals(qName); // true when body tag
    }

    public void endElement(String uri, String localName, String qName) throws SAXException {
        if(bodyFlag) {
            bodyFlag = false;
            System.out.println(body.toString());
        }
    }

    public void characters(char ch[], int start, int length) throws SAXException {
        if(bodyFlag) {
            body.append(new String(ch, start, length);
        }
    }
}
polypiel
  • 2,321
  • 1
  • 19
  • 27
0

You could set a flag, or use an enum to indicate which element you're on in your start, and interpret it accordingly in the characters method.

One thing I have done is have a set of anonymous classes that correspond to tags. In the start, I flag which element I am in, so that I can use the correct anonymous class to do what I want with the characters based on the element (such as error handling, date formatting, or in your case printing the characters related to the tag). I store those anonymous inner classes are stored in a map with the tag as the key. So in characters, I know what element I am in and if I have a proper handler, I'll handle it.

This method is really useful when transforming an xml input into classes.

Jeff C.
  • 91
  • 4