1

I want to generate

<td>&nbsp;</td>

using xom.

None of these work:

private static void test(String s) {
  Element e = new Element("td");
  e.appendChild(s);
  System.out.println("XML(\"" + s + "\"): " + e.toXML());
}

private static void test() throws UnsupportedEncodingException {
  final String nbsp = "\u00A0";
  final String nbsp2 = "\uC2A0";
  final String nbsp3 = "&#038;nbsp;";
  test(nbsp);
  test(nbsp2);
  test(nbsp3);
  test("&nbsp;");
  final byte[] b = nbsp.getBytes("UTF-8");
  test(new String(b, "UTF-8"));
}

I get

XML(" "): <td> </td>
XML("슠"): <td>슠</td>
XML("&#038;nbsp;"): <td>&amp;#038;nbsp;</td>
XML("&nbsp;"): <td>&amp;nbsp;</td>
XML(" "): <td> </td>

any ideas?

Character encoding is set to "UTF-8" in my IDE.

OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213

1 Answers1

0

I suggest you don't use toXML() but use nu.xom.Serializer which normally gives explicit numeric entity references.

Serializer should give an explicit numeric entity reference (&#160;). If you really need &nbsp; you may have to subclass Serializer and override the Text methods.

To use Serializer try:

    OutputStream out = new FileOutputStream(file);
    Serializer ser = new Serializer(out);
    ser.write(doc);
    out.close();

if you have to subclass Serializer it gets more tricky.

peter.murray.rust
  • 37,407
  • 44
  • 153
  • 217
  • Try Serializer first and see what it gives. If it gives   will that be good enough for you? (all browsers and other tools should process it). – peter.murray.rust Aug 06 '13 at 22:57
  • A bit of poking around in the Serializer source suggests it uses `Text` too which is the primary cause of the issue. I'll see what I can work out. Right now I am doing a `String.replace("&nbsp;", "&nbsp")` on the `toXML` output but that is a horrible hack. – OldCurmudgeon Aug 06 '13 at 23:03
  • Yes, It's horrible - I went down that route. Serializer users a TextWriter which allows individual characters to be output - but it's private so a subclass can't use it (that seems wrong - it should be possible to override write(Text) and just subvert the TextWriter (escaper). – peter.murray.rust Aug 06 '13 at 23:13