1

In my software im receiving a xml file that is containing some HTML entities like & amp; or whatever. Im successfull decoding the xml but not the HTML entities. The strings are cutted when they meet an html entities... Anybody can help ? I have such code actually to decode the xml...

            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
   DocumentBuilder builder = factory.newDocumentBuilder();
InputStream inputStream = entity.getContent();
Document dom = builder.parse(inputStream);
   inputStream.close();


   Element racine = dom.getDocumentElement();
   NodeList nodeLst=racine.getElementsByTagName("product");

Does anyone know how i can do the same job, decoding the xml as a dom object and also decoding HTML entities ?

Actually my dom object is not correct because its contain some strings that are cutted because of HTML entities... what can i do ?

Fabien
  • 1,967
  • 8
  • 30
  • 42
  • Can you expand what exactly is in the XML file? Is it, for example, `A&B` or `A&B`? And what do you exactly need as the end result, `A&B` or `A&B`? And what do you mean with "cutting"? – RoToRa Nov 09 '10 at 10:26

3 Answers3

1

I have two approaches to suggest:

  1. Deactivate validation: factory.setValidating(false);

  2. Add a XHTML DTD tag to your XML stream, immediately after the <?xml ...> tag.

    <?xml version="1.0"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Jean Hominal
  • 16,518
  • 5
  • 56
  • 90
  • Thanks for your answer. I cannot test it right now because we used another approach and changed the way our server was sending datas to us. I hope this answer may help other people. – Fabien Nov 10 '10 at 13:14
  • How do deactivate validation when I do getResources().getXml(R.xml.laws) ? – Daniel Ryan Sep 06 '11 at 23:56
  • 1
    @Zammbi: I believe you should be able to deactivate validation by using the [`XmlPullParser`](http://developer.android.com/reference/org/xmlpull/v1/XmlPullParser.html) interface and the `setFeature` method. I would suggest asking a new question if you need more information. – Jean Hominal Sep 12 '11 at 08:56
1

I think it iss because it detect "'" apostrophe as a final of string. I've founded a solution.

String stringDatosEntrada = new Scanner(urlConnection.getInputStream()).useDelimiter("\\A").next().replaceAll("&amp;#39;","\'").replaceAll("&#39;","\'");

InputStream is = new ByteArrayInputStream(stringDatosEntrada.getBytes());
Document dom = builder.parse(inputStream)
Yi Jiang
  • 49,435
  • 16
  • 136
  • 136
0

You could try using androids Html tag editor. It should do what you want, it doesn't recognise all HTML but it does seem to work to convert strings:

    Html.fromHtml(inputstream)

Here is a simple example:

    TextView tv = (TextView) findViewById(R.id.tv);
    String s = "<b>This is</b> my first <u>HTML String</u> &amp; it works well!";
    tv.setText(Html.fromHtml(s));

Here is the output:

Community
  • 1
  • 1
Scoobler
  • 9,696
  • 4
  • 36
  • 51
  • I know about this function, thanks. But its cannot help as my dom object is already invalid (strings inside are cutted). Its too late to use this function. I need another way to parse the xml file that will accept HTML entities and not cut them. – Fabien Nov 09 '10 at 09:46
  • Possibly, looking at this site [Using XPATH and HTML Cleaner to parse HTML / XML](http://thinkandroid.wordpress.com/2010/01/05/using-xpath-and-html-cleaner-to-parse-html-xml) might be more help? – Scoobler Nov 09 '10 at 09:50
  • See the very similar post, the user has use xmlpullparser - [Parsing html numbers in xml](http://stackoverflow.com/questions/4132092/parsing-html-numbers-like-189-in-dom-parser-android/4132536#4132536) Maybe this may help? – Scoobler Nov 09 '10 at 18:06