1

I am trying to define an XML ENTITY that contains only a newline character (&#10 ;). But unfortunalty this does not seem to be working:

<?xml version="1.0"?>
<!DOCTYPE my_doc [
 <!ENTITY newline "&#10;">
]>

<root attr="hello &newline; world !!!" attr1="hello &#10; world !!!" ></root>

In the example above I expect the attributes attr and attr1 to have the same value. But in case of 'attr' the entity 'newline' is replaced with a space by parsers:

attr => hello   world !!!
attr1 => hello
 world !!!

I am using python to parse this, but I do not think that this is relevant:

import xml.etree.ElementTree as ET

data_as_string = """<?xml version="1.0"?>
<!DOCTYPE my_doc [
 <!ENTITY newline "&#10;">
]>

<root attr="hello &newline; world !!!" attr1="hello &#10; world !!!" ></root>
"""

root = ET.fromstring(data_as_string)

print root
print root.attrib

for k,v in root.attrib.items():
    print "%s => %s" % (k, v)

Does know a solution for this?

Thanks, Gerald

mzjn
  • 48,958
  • 13
  • 128
  • 248
nutrina
  • 1,002
  • 1
  • 12
  • 26
  • This is a little tricky, but I think that the behaviour you are seeing is in accordance with https://w3.org/TR/xml/#AVNormalize. For entity references where the replacement text is a whitespace character (#x20, #xD, #xA, #x9), the text is normalized to a space character (#x20). But character references are not normalized in this way. – mzjn Jan 03 '17 at 11:34

0 Answers0