0

I have a bytes object containing a utf-8 encoded xml file(say, file1). I need to save this file to directory as an xml file so I convert it into an ElementTree with the following code:

import xml.etree.ElementTree as ET
tree = ET.ElementTree(ET.fromstring(file1))

I expect when I convert this back using the following line to also be utf-8 encoded and to be entirely equal to file1.

file2 = ET.tostring(tree.getroot(), encoding='utf-8', method='xml')

To be clear, I expect file1 == file2 to return True, yet it returns False. Looking at the bytes objects, I can see that file1 starts with the following line yet this line is missing in file2.

b'<?xml version="1.0" encoding="UTF-8"?> #file1

Any ideas on what I'm missing?

C.Acarbay
  • 424
  • 5
  • 17
  • If you just want to write the bytes to disk, there's no need to parse the structure, just open a file in binary mode and `.write()` the bytes object. – The differences might come from minor serialisation differences, eg. from the order of the attributes. For example, `` is equivalent to `` in terms of XML, but it's not the same string. – lenz Jan 25 '19 at 14:58
  • Oh, just realised you're missing the XML declaration. Check out [this post](https://stackoverflow.com/q/12966488) about how you can preserve it. – lenz Jan 25 '19 at 15:17
  • @lenz: `` and `` are not eqivalent. The latter form is invalid. `version="1.0"` must always come first, followed by `encoding="..."`. – mzjn Jan 26 '19 at 06:56
  • Thanks @mzjn, I didn't know the XML declaration was special in this respect. In regular tags the order of the attributes is irrelevant. – lenz Jan 26 '19 at 08:07

0 Answers0