0

While migrating to Python 3, I noticed some files we generate using the built-in csv now have b' prefix around each strings...

Here's the code, that should generate a .csv for a list of dogs, according to some parameters defined by export_fields (thus always returns unicode data):

file_content = StringIO()
csv_writer = csv.writer(
    file_content, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL
)
csv_writer.writerow([
    header_name.encode('cp1252') for _v, header_name in export_fields
])
# Write content
for dog in dogs:
    csv_writer.writerow([
        get_value(dog).encode('cp1252') for get_value, _header in export_fields
    ])

The problem is once I returns file_content.getvalue(), I get:

b'Does he bark?'    b'Full     Name'    b'Gender'
b'Sometimes, yes'   b'Woofy the dog'    b'Male' 

Instead of (indentation has been modified to be readable on SO):

'Does he bark?'   'Full     Name'   'Gender'
'Sometimes, yes'  'Woofy the dog'   'Male' 

I did not find any encoding parameter in the csv module. I would like the whole file to be encoded in cp1252, so I don't really care either the encoding is done through the iteration of the lines or on the file construted itself.

So, does anyone know how to generate a proper string, containing only cp1252 encoded strings?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Maxime Lorant
  • 34,607
  • 19
  • 87
  • 97
  • Why are you encoding in the first place? The open file object takes care of that. – Martijn Pieters Jul 29 '16 at 10:52
  • @MartijnPieters Maybe my question is incomplete then: I want to return the string through Django: `return HttpResponse(generate_csv_file())`. Should I handle encoding at Django level instead? – Maxime Lorant Jul 29 '16 at 10:55
  • See my answer; you are approaching this at the wrong level; tabs and quotechars need to be encoded too, but this is the job of the I/O level, not the `csv` module or the code producing rows. – Martijn Pieters Jul 29 '16 at 10:57

1 Answers1

2

The csv module deals with text, and converts anything that is not a string to a string using str().

Don't pass in bytes objects. Pass in str objects or types that cleanly convert to strings with str(). That means you should not encode strings.

If you need cp1252 output, encode the StringIO value:

file_content.getvalue().encode('cp1252')

as StringIO objects also deal in text only.

Better yet, use a BytesIO object with a TextIOWrapper() to do the encoding for you as the csv module writes to the file object:

from io import BytesIO, TextIOWrapper

file_content = BytesIO()
wrapper = TextIOWrapper(file_content, encoding='cp1252', line_buffering=True)
csv_writer = csv.writer(
    wrapper, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)

# write rows

result = file_content.getvalue()

I've enabled line-buffering on the wrapper so that it'll auto-flush to the BytesIO instance every time a row is written.

Now file_content.getvalue() produces a bytestring:

>>> from io import BytesIO, TextIOWrapper
>>> import csv
>>> file_content = BytesIO()
>>> wrapper = TextIOWrapper(file_content, encoding='cp1252', line_buffering=True)
>>> csv_writer = csv.writer(wrapper, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)
>>> csv_writer.writerow(['Does he bark?', 'Full     Name', 'Gender'])
36
>>> csv_writer.writerow(['Sometimes, yes', 'Woofy the dog', 'Male'])
35
>>> file_content.getvalue()
b'Does he bark?\tFull     Name\tGender\r\nSometimes, yes\tWoofy the dog\tMale\r\n'
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343