Im getting data from a csv file, doing something with it and then writing it to a text template.
The problem occurs when I come across characters that I cannot encode.
For example, when I come accross a value written in chinese, the selected field is blank when I open it with some kind of a csv editor (e.g. LibreOffice Calc for Linux).
But when I get the data via csv.reader in my script, I can see that it is actually a string that hasn't been decoded properly. And when I try to write it to a template, I get this weird SUB string.
Here is the breakdown of the problem:
for row in csv.DictReader(csvfile):
# take value from the row and store it in a dictionary
....
# take the values from the dictionary and write them to a template
with open('template.txt', 'r+') as template:
src = Template(template.read())
content = src.substitute(rec)
with open('myoutput.txt', 'w') as bill:
bill.write(content)
And the template.txt looks like this:
$name
$address
$city
...
All of this generates txt files like this:
Bill
North Grove 14
Scottsdale
...
If any of the dictionary values are empty, e.g. an empty string ''
, my template rendering function ignores the tag, so for example if the address
attribute was missing from a particular row, the output would be
Bill
Scottsdale
...
When I try to do that with my chinese data, my function does write the data because the strings in question are not empty. And when I write them to a template, the end result looks like this:
SUB
SUB
Hong Kong
...
How can I display my data properly? Also is there a way to skip that data, for example something that can try to decode the data, and if it's not successful, convert it to an empty string.
P.S. try except
won't work here, because mystring.encode('utf-8')
or mystring.encode('latin-1')
will encode the string, but it will still be outputted as garbage.
EDIT
After printing out the problem row, the output of the problematic values is the following:
{'Name': '\x1a \x1a\x1a', 'State': '\x1a\x1a\x1a'}