289

I've converted my scripts from Python 2.7 to 3.2, and I have a bug.

# -*- coding: utf-8 -*-
import time
from datetime import date
from lxml import etree
from collections import OrderedDict

# Create the root element
page = etree.Element('results')

# Make a new document tree
doc = etree.ElementTree(page)

# Add the subelements
pageElement = etree.SubElement(page, 'Country',Tim = 'Now', 
                                      name='Germany', AnotherParameter = 'Bye',
                                      Code='DE',
                                      Storage='Basic')
pageElement = etree.SubElement(page, 'City', 
                                      name='Germany',
                                      Code='PZ',
                                      Storage='Basic',AnotherParameter = 'Hello')
# For multiple multiple attributes, use as shown above

# Save to XML file
outFile = open('output.xml', 'w')
doc.write(outFile) 

On the last line, I got this error:

builtins.TypeError: must be str, not bytes
File "C:\PythonExamples\XmlReportGeneratorExample.py", line 29, in <module>
  doc.write(outFile)
File "c:\Python32\Lib\site-packages\lxml\etree.pyd", line 1853, in lxml.etree._ElementTree.write (src/lxml/lxml.etree.c:44355)
File "c:\Python32\Lib\site-packages\lxml\etree.pyd", line 478, in lxml.etree._tofilelike (src/lxml/lxml.etree.c:90649)
File "c:\Python32\Lib\site-packages\lxml\etree.pyd", line 282, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:7972)
File "c:\Python32\Lib\site-packages\lxml\etree.pyd", line 378, in lxml.etree._FilelikeWriter.write (src/lxml/lxml.etree.c:89527)

I've installed Python 3.2, and I've installed lxml-2.3.win32-py3.2.exe.

On Python 2.7, it works.

cottontail
  • 10,268
  • 18
  • 50
  • 51
user278618
  • 19,306
  • 42
  • 126
  • 196
  • 15
    Did not really investigate this, but a quick guess is that you should open the file in binary mode. – Sven Marnach Apr 01 '11 at 11:39
  • Related: https://stackoverflow.com/questions/13906623/using-pickle-dump-typeerror-must-be-str-not-bytes (with the pickle library, not lxml) – user202729 Jan 19 '21 at 14:33

3 Answers3

641

The outfile should be in binary mode.

outFile = open('output.xml', 'wb')
Lennart Regebro
  • 167,292
  • 41
  • 224
  • 251
  • 119
    Mind blown. Python3 has reimagined what to do with that little 'b'. It used to only annoy Windows users who would forget to include it (or couldn't because they were using stdio). Now it can annoy Python users on all platforms. Hopefully, it will be worth the pain. – Brent Bradburn Aug 17 '13 at 06:11
  • 7
    If you are parsing text it is definitely worth it. – Lennart Regebro Jan 15 '14 at 21:56
  • @nobar It is required to e.g. switch off Universal newline support, http://legacy.python.org/dev/peps/pep-0278/ , which is on by default in Python 3 – user7610 Jul 26 '14 at 15:28
  • Works for me in gzip for python3 too! `json.load(gzip.open('file.json.gz'))` fails, and `json.load(gzip.open('file.json.gz', 'rt'))` succeeds! – hobs Nov 18 '16 at 19:24
  • @LennartRegebro, Not if the system setting is unexpected. Binary is best and less error prone. If it works it really does work. As for text, there's always a "what if" involved. – Pacerier Feb 16 '17 at 19:14
  • I stand by my previous comment. And no, binary "does not really work if it works". In fact, there are multiple ways of silently failing when you deal with text as if it's binary. That's the whole point of the Unicode/String type. – Lennart Regebro Feb 28 '17 at 11:57
  • Weird, I ran into a huge misunderstanding here. The error says `must be str, not bytes`, and you read that as if it needs a string, and you think that `b` stands for `bytes` here. Then by simply giving it a chance, following the herd only, I tried `wb`. And that really solved it. And I only see now that the `b` stands for `binary`, not `bytes`. Hope I am right now ;). – questionto42 Jan 31 '22 at 20:31
  • Yes, because then it needs bytes. – Lennart Regebro Feb 02 '22 at 08:13
9

Convert binary file to base64 & vice versa. Prove in python 3.5.2

import base64

read_file = open('/tmp/newgalax.png', 'rb')
data = read_file.read()

b64 = base64.b64encode(data)

print (b64)

# Save file
decode_b64 = base64.b64decode(b64)
out_file = open('/tmp/out_newgalax.png', 'wb')
out_file.write(decode_b64)

# Test in python 3.5.2
djperalta
  • 363
  • 3
  • 4
0

If for whatever reason, the output file was opened with mode='w' and cannot be reopened with 'wb', a workaround is to access .buffer on the TextIOWrapper to create a BufferedWriter (which is instantiated if a file was opened with mode='wb') and write.

s = """
<country name="Liechtenstein">
    <year>2008</year>
    <gdppc>141100</gdppc>
</country>
"""
import xml.etree.ElementTree as ET
doc = ET.ElementTree(ET.fromstring(s))

outFile = open('output.xml', 'w')
doc.write(outFile.buffer)             # <--- buffer here
outFile.close()
cottontail
  • 10,268
  • 18
  • 50
  • 51