5

I have an XML document, part of which has the following in it:

<math display='block'><mtext>&#x2009;</mtext></math>

If this is loaded into Qt (specifically the Qt MathML widget where I found this problem), the QDomDocument object loses the unicode thin space character (U+2009). This Python example code demonstrates the problem:

from PyQt4.QtXml import *

d = QDomDocument()
d.setContent("<math display='block'><mtext>&#x2009;</mtext></math>")
print repr(unicode(d.toString()))

The output from this code is:

u'<math display="block">\n <mtext/>\n</math>\n'

Inserting an extra non-space character after the thin space stops the thin space being lost.

Is this my mistake, an XML feature, or does Qt have a bug?

xioxox
  • 2,526
  • 1
  • 22
  • 22

1 Answers1

5

From QDomDocument's documentation:

Text nodes consisting only of whitespace are stripped and won't appear in the QDomDocument. If this behavior is not desired, one can use the setContent() overload that allows a QXmlReader to be supplied.

So this way you do not lose the white space only data (example is in C++):

QXmlSimpleReader reader;
QXmlInputSource source;
QDomDocument dom;

source.setData(QString("<mtext>&#x2009;</mtext>"));
dom.setContent(&source, &reader);
  • 1
    Hm-m-m. Except that   isn't whitespace. The XML rec says that only "space (#x20) characters, carriage returns, line feeds, or tabs" are whitespace. The rec also says, "A special attribute named xml:space may be attached to an element to signal an intention that in that element, white space should be preserved by applications." So if QT is treating   as whitespace, it might be simpler for @xioxix to alert the XML processor thusly: `` – Roger_S Jun 10 '12 at 17:12
  • 2
    Well, QDomDocument handles that thin space character as a white space and not according to XML specification. And QDomDocument does not seem to support the xml:space attribute. So xioxox could [make a bug report](https://bugreports.qt-project.org/secure/Dashboard.jspa). While waiting for the fix, the `setContent()` using QXmlReader works. –  Jun 10 '12 at 17:46
  • Thanks - I'll accept this as the solution, and file a bug report! – xioxox Jun 10 '12 at 18:31