0

In my XML file, I have a tag that has the special char

<journal>Universit&auml;t Trier</journal>

When I run the SAX Parser, I realized that it will divide the String into 2

String 1: Universit
String 2: &auml;t Trier

However I need to read the content as one whole String in order to properly store in Database. I can't have it split into 2 String. Why does SAX Parser do this anyway?

Following method is part of the SAX parser that does the Reading

public void characters(char ch[], int start, int length) throws SAXException 
{
                       ...
}
nwellnhof
  • 32,319
  • 7
  • 89
  • 113
user2741620
  • 305
  • 2
  • 7
  • 21
  • Look at http://stackoverflow.com/questions/13336140/sax-parsing-and-special-characters and http://stackoverflow.com/questions/8770097/how-to-make-saxparser-ignore-escape-codes – Dan Oct 12 '13 at 19:07
  • 1
    Could you flag the answer as accepted ? You should also clean you other posts and provide some follow up. TIA – Ludovic Kuty Oct 25 '13 at 10:09

1 Answers1

4

This is not a bug.

It is simply the way SAX is designed, and it needs to be this way in order to have any possibility of dealing with mixed content.

Without mixed content it's actually quite simple.

What you need to do is to recombine the fragments in your own implementation of the SAX ContentHandler interface.

Typically this means initializing a StringBuilder or StringBuffer field in the startElement method, appending to it in the characters method and converting it to a String in the endElement method.

Don Roby
  • 40,677
  • 6
  • 91
  • 113