14

I have this xml from sql, and I want to do the same by python 2.7 and lxml

<?xml version="1.0" encoding="utf-16"?>
<results>
  <Country name="Germany" Code="DE" Storage="Basic" Status="Fresh" Type="Photo" />
</results>

Now I have:

from lxml import etree

# create XML 
results= etree.Element('results')

country= etree.Element('country')
country.text = 'Germany'
root.append(country)



filename = "xmltestthing.xml"
FILE = open(filename,"w")
FILE.writelines(etree.tostring(root, pretty_print=True))
FILE.close()

Do you know how to add rest of attributes?

user278618
  • 19,306
  • 42
  • 126
  • 196

4 Answers4

21

Note this also prints the BOM

>>> from lxml.etree import tostring
>>> from lxml.builder import E
>>> print tostring(
             E.results(
                 E.Country(name='Germany',
                           Code='DE',
                           Storage='Basic',
                           Status='Fresh',
                           Type='Photo')
             ), pretty_print=True, xml_declaration=True, encoding='UTF-16')

��<?xml version='1.0' encoding='UTF-16'?>
<results>
  <Country Status="Fresh" Type="Photo" Code="DE" Storage="Basic" name="Germany"/>
</results>
Marco Mariani
  • 13,556
  • 6
  • 39
  • 55
15
from lxml import etree

# Create the root element
page = etree.Element('results')

# Make a new document tree
doc = etree.ElementTree(page)

# Add the subelements
pageElement = etree.SubElement(page, 'Country', 
                                      name='Germany',
                                      Code='DE',
                                      Storage='Basic')
# For multiple multiple attributes, use as shown above

# Save to XML file
outFile = open('output.xml', 'w')
doc.write(outFile, xml_declaration=True, encoding='utf-16') 
user225312
  • 126,773
  • 69
  • 172
  • 181
  • 1
    I would replace the latest two lines with doc.write('output.xml', xml_declaration=True, encoding='utf-16') – systempuntoout Dec 17 '10 at 11:36
  • Well yes that is correct, but my main intention was to show how it is done rather than the eye candy ;) – user225312 Dec 17 '10 at 11:36
  • My xml now is: ਍㰀爀攀猀甀氀琀猀㸀㰀䌀漀甀渀琀爀礀 䌀漀搀攀㴀∀䐀䔀∀ 匀琀漀爀愀最攀㴀∀䈀愀猀椀挀∀ 渀愀洀攀㴀∀䜀攀爀洀愀渀礀∀⼀㸀㰀⼀爀攀猀甀氀琀猀㸀 – user278618 Dec 17 '10 at 11:52
  • It works for me though. I wonder why, try it with Firefox perhaps (no reason, but worth trying) – user225312 Dec 17 '10 at 12:13
  • 3
    @sukbir is probably not using Windows. What happens is that lxml writes a newline (0A 00 in UTF-16LE) between the XML header and the body. This is then molested by Win text mode to become 0D 0A 00 which makes everything after that look like UTF-16BE hence the Chinese etc characters when you display it. You can get around this in this instance by using "wb" instead of "w" when you open the file. However I'd strongly suggest that you use 'UTF-8' (spelled EXACTLY like that) as your encoding. Why are you using UTF-16? You like large files and/or weird problems? – John Machin Dec 17 '10 at 21:20
  • @John Machin: Well thanks for clarifying this, I had no idea why this was happening. – user225312 Dec 18 '10 at 04:47
  • I know this is a matter of taste, but I prefer this answer than the accepted answer which uses lxml reflection. – Alan Evangelista Feb 21 '14 at 14:33
4

Save to XML file

doc.write('output.xml', xml_declaration=True, encoding='utf-16') 

instead of:

outFile = open('output.xml', 'w')

doc.write(outFile, xml_declaration=True, encoding='utf-16') 
Aziz Shaikh
  • 16,245
  • 11
  • 62
  • 79
Habib
  • 41
  • 1
  • Will this respect XML indentation? I am creating the XML file in a similar fashion. But having issues in formatting whenever I add a element. If I modifytag or modify text and write back to a new xml file it works fine. Don't know with additions it's not working. Here is the format:testtest1test1 – suresh Apr 21 '16 at 14:31
3

Promoting my comment to an answer:

@sukbir is probably not using Windows. What happens is that lxml writes a newline (0A 00 in UTF-16LE) between the XML header and the body. This is then molested by Win text mode to become 0D 0A 00 which makes everything after that look like UTF-16BE hence the Chinese etc characters when you display it. You can get around this in this instance by using "wb" instead of "w" when you open the file. However I'd strongly suggest that you use 'UTF-8' (spelled EXACTLY like that) as your encoding. Why are you using UTF-16? You like large files and/or weird problems?

John Machin
  • 81,303
  • 11
  • 141
  • 189
  • Unfortunately the "wb" didn't solve this issue for me, but the newlines were the cause, so was able to work around the issue by writing the xml on one line (no pretty_print) and manually adding the declaration. On the question of "Why are you using UTF-16? You like large files and/or weird problems?" it could be (as in my case) that a 3rd party required a file in UTF-16. If you deal with other interfaces from other parties then you don't always have control over what you send them. – GHZ Apr 15 '14 at 16:07