0

I'm working on an app which is in the German language. I'm getting the data in XML form. I used SAX parser for parsing these XMLs and display the data in the TextView. Everything is working fine except the special-characters issue which I got after the parsing.

This is my XML which I got through the URL Link. This XML has utf-8 encoding. All the characters are fine in this XML file.

<?xml version="1.0" encoding="utf-8"?>
<posts>
    <page id="001">
        <title><![CDATA[Sie kaufen bei uns ausschließlich Holzkunst- und Volkskunst-Produkte ]]></title>
        <detial><![CDATA[Durch enge Beziehungen mit unseren Lieferanten können wir attraktive rückläufig 
        Preise und schnelle Lieferungen gewährleisten. Caroline Féry and Laura Herbst Universität Potsdam Mein 
        Flugzeug hatte zwölf Stunden VERSPÄTUNG </p>]]></detial>
    </page>     
</posts>

I used SAX parser for parsing this XML:- (and displaying the parsed data in the TextView.)

public class GermanParseActivity extends Activity {
    /** Called when the activity is first created. */

    static final String URL = "http://www.xyz.com/id=1";

    ItemList itemList;

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);

        XMLParser parser = new XMLParser();
        String XML = parser.getXmlFromUrl(URL);

        System.out.println("This XML is ========>"+XML);

       try
       {
           SAXParserFactory spf = SAXParserFactory.newInstance();
       SAXParser sp = spf.newSAXParser();
           XMLReader xr = sp.getXMLReader();

           /** Create handler to handle XML Tags ( extends DefaultHandler ) */
           MyXMLHandler myXMLHandler = new MyXMLHandler();
           xr.setContentHandler(myXMLHandler);

       ByteArrayInputStream is = new ByteArrayInputStream(XML.getBytes());
       xr.parse(new InputSource(is));
      }
      catch(Exception e)
      {

      }

      itemList = MyXMLHandler.itemList;

      ArrayList<String> listItem= itemList.getTitle();


     ListView lview = (ListView) findViewById(R.id.listview1);
     myAdapter adapter = new myAdapter(this, listItem);
     lview.setAdapter(adapter);
    }


}

but after parsing I'm getting strange characters which are not in XML file but generated after parsing the XML file.

Like these characters:

before parsing after parsing

können ---> können

rückläufig ---> rückläufig

gewährleisten ---> gewährleisten

Can anyone please suggest the proper way to fix this issue?

Widor
  • 13,003
  • 7
  • 42
  • 64
user755278
  • 1,634
  • 3
  • 16
  • 32

2 Answers2

4

You need to reencode your input. The problem is that the text is UTF-8 but is interpreted as ISO-8859-1. That seems to be a bug of SAX.

String output=new String(input.getBytes("8859_1"), "utf-8");

That line takes the ISO-8859-1 and converts it to utf-8 which is used by Java.

rekire
  • 47,260
  • 30
  • 167
  • 264
  • can you please suggest me where should i use this line of code in my above GermanParseActivity class...Thanks – user755278 May 24 '12 at 07:11
  • Around your `System.out.println("This XML is ========>"+XML);` line. Where input and output are the variable XML. – rekire May 24 '12 at 07:42
  • Thanks a lot it worked for me...i have been wandering since long time and there are many people who are looking for the same thing.And you have given the simplest answer ever in a great way..Thank you rekire..Cheers!! – user755278 May 24 '12 at 08:01
1

got my anwser from here They suggest that the heading should be:

<?xml version="1.0" encoding="ISO-8859-1"?>

instead of

<?xml version="1.0" encoding="utf-8"?>

Hope that is the answer- edit just saw that you don't have control over the xml, so this will not help, rekire's answer is then a option

Community
  • 1
  • 1
mariomario
  • 660
  • 1
  • 9
  • 29