3

I want to parse some data from an xml file using SAX parser. My xml is as follows:

<categories>
 <cat>Pies &amp; past</cat>
 <cat>Fruits</cat>
</categories>

In order to parse this data I extend DefaultHandler.

The output after parsing is:

cat 1 = Pies

cat 2 = &

cat 3 = past

cat 4 = Fruits

Why is this happening instead of getting:

cat 1 = Pies & past

cat 2 = Fruits
giorgos_412
  • 35
  • 1
  • 3
  • http://stackoverflow.com/questions/8770097/how-to-make-saxparser-ignore-escape-codes – Faruk Sahin Nov 11 '12 at 22:32
  • See [this](http://stackoverflow.com/questions/4567636/java-sax-parser-split-calls-to-characters#answer-4567652) for an answer. – ShyJ Nov 11 '12 at 22:47

2 Answers2

10

My guess is that you are treating each call to characters as delivering the complete text for a cat element. You should code your handler so that successive calls to characters accumulate the text, and you only capture it on the endElement event:

public class CatHandler extends DefaultHandler {
    private StringBuilder chars = new StringBuilder();

    public void startElement(String uri, String lName, String qName, Attributes a)
    {
        final String name = qName == null ? lName : qName;
        if ("cat".equals(name)) {
            chars.setLength(0);
        } else . . .
    }

    public void endElement(String uri, String lName, String qName) {
        final String name = qName == null ? lName : qName;
        if ("cat".equals(name)) {
            String catName = chars.toString();
            // do something with cat name
        } else . . .
    }

    public void characters(char[] ch, int start, int length) {
        chars.append(ch, start, length);
    }
Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
3

The characters() method doesn't have to return the complete text element. Rather you should collate the text available in each characters() call, and concatenate these upon the corresponding endElement() call.

From the doc:

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks

(my emphasis)

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440