0

I'm parsing the following XML using parser:

<Person>
<Name>Test</Name>
<Phone>111-111-2222</OtherPhone>
<Address>lee h&amp;y</Address>
<Person>

The characters method of the sax parser is only reading the address data until 'lee h' as it does not consider '&' as a character. I need to get the complete text in the address element. Any ideas on how I should do it? This is my sax parser(here address is a flag which notifies that an address element is present in XML):

boolean address=false;

 public void startElement(String uri, String localName,
            String qName, Attributes attributes)
            throws SAXException {


        if (qName.equalsIgnoreCase("Address")) {
            address= true;

        }

    public void characters(char ch[], int start, int length)
                throws SAXException {

            String data = new String(ch, start, length);


            if (address) {

                System.out.println("Address is: "+data);
                address = false;
            }

and the output is:: lee h

Srinivas
  • 545
  • 5
  • 9
  • 16

2 Answers2

6

The characters method is called three times here to report the content of the element Address because of the presence of an external entity. You should accumulate the content of the calls to characters until you receive an endElement event and then you have the complete content.

Please note the documentation of the characters method.

You could also benefit from the use of the ignorableWhitespace method with a validating parser and the appropriate schema (e.g. DTD) to let the parser know which spaces are ignorable (due to indentation).

In Java, it could be:

class MyHandler extends DefaultHandler {

    private StringBuilder acc;

    public MyHandler() {
        acc = new StringBuilder();
    }

    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {
        System.out.printf("Characters accumulated: %s\n", acc.toString());
        acc.setLength(0);
    }

    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        acc.append(ch, start, length);
    }
}
Ludovic Kuty
  • 4,868
  • 3
  • 28
  • 42
  • I have edited my initial question to include the implementation of my SAX parser. Can you please take a look and tell me where I'm going wrong as I am trying to read all characters in the address element. thanks. – Srinivas Oct 17 '11 at 19:59
  • As I told you, you have to accumulate characters in a StringBuilder before getting it like I did in my example. – Ludovic Kuty Oct 18 '11 at 10:37
  • 1
    You could also take a look at the [answer](http://stackoverflow.com/questions/6527506/extracting-text-nodes-from-xml-file-using-sax-parser-in-java/6527661#6527661). It is exactly the same method. – Ludovic Kuty Oct 18 '11 at 10:59
0

The answer depends to some extent which parser you're using.

Here's a thorough rundown on the issue: http://www.ibm.com/developerworks/xml/library/x-tipsaxdo4/index.html

With a StaX parser you can specify the property isCoalescing=true. This property specifies whether to coalesce adjacent adjacent character data.

But with SAX there is no such control, generally.

Mike Sokolov
  • 6,914
  • 2
  • 23
  • 31