Myanmar characters encoding in Python 3.4

Question

UnicodeEncodeError 'charmap' codec can't encode characters in position 1-12

I get this error on trying to paste the string in Myanmar language into Jinja2 template and save the template. I installed all needed fonts in the OS, tried to use codec lib. The psocess: python script parses CSV file with data, then creates a dictionary and this dictionary is then used to fill variables used in Jinja2 template with values. Error raises on the moment of writing to the file. Using Python 3.4. There is a package called python-myanmar but it's for 2.7 and I do not want to downgrade my own code. Read already all this: http://www.unicode.org/notes/tn11/, http://chimera.labs.oreilly.com/books/1230000000393/ch02.html#_discussion_31, https://code.google.com/p/python-myanmar/ package and installed system fonts. I can encode the string into .encode('utf-8'), but cant then .decode() w/o the error! The question is: how can I not downgrading the code, maybe installing something additional, but best is using only python 3.4 embedded functions write the data into the file?

C:\Users\...\autocrm.py in create_templates(csvfile_location, csv_delimiter, template_location, count
ies_to_update, push_onthefly, csv_gspreadsheet, **kwargs)
    270                 ### use different parsers for ventures due to possible difference in website design
    271                 ### checks if there is a link in CSV/TSV
--> 272                 if variables['promo_link'] != '':
    273                     article_values = soup_the_newsletter_article(variables['promo_link'])
    274                 if variables['item1_link'] != '':

C:\Users\...\autocrm.py in push_to_ums(countries_to_update, html_template, **kwargs)
    471                     ### save to import.xml
    472                     with open(xml_path_upload, 'w') as writefile:
--> 473                         writefile.write(template.render(**values))
    474                         print('saved the import.xml')
    475

C:\Python34\lib\encodings\cp1252.py in encode(self, input, final)
     17 class IncrementalEncoder(codecs.IncrementalEncoder):
     18     def encode(self, input, final=False):
---> 19         return codecs.charmap_encode(input,self.errors,encoding_table)[0]
     20
     21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode characters in position 6761-6772: character maps to <undefined>

BTW, why is it pointing to cp1251.py if my sys.getdefaultencoding() output is UTF8??

        with open(template_location, 'r') as raw_html:
            template = Template(raw_html.read())
            print('writing to template: ' + variables['country_id'])
            # import ipdb;ipdb.set_trace()
            with open('rendered_templates_L\\NL_' +
                    variables['country_id'] + ".html", 'w', encoding='utf-8') as writefile:
                rendered_template = template.render(**alldata)
                writefile.write(rendered_template)

Note that the default encoding for files is taken from `locale.getpreferredencoding(False)`, not `sys.getdefaultencoding()`. It is the former that returns `cp1252` for your system. — Martijn Pieters, May 20 '14 at 09:46

score 0 · Answer 1 · answered May 19 '14 at 13:25

0

You opened the output file without specifying an encoding, so the default system encoding is used; here CP1251.

The Jinja template result produces a Unicode string, which needs to be encoded, but the default system encoding doesn't support the codepoints produced.

The solution is to pick an explict codec. If you are producing XML, UTF-8 is the default encoding and can handle all of Unicode:

with open(xml_path_upload, 'w', encoding='utf8') as writefile:
     writefile.write(template.render(**values))

answered May 19 '14 at 13:25

Martijn Pieters

1,048,767
296
4,058
3,343

No, I actually use `encoding='utf-8'` parameter while opening the file for writing... And it is still pointing to cp1251. – boldnik May 19 '14 at 13:46
@boldnik: can you confirm that `value = template.render(**values)` (without writing) works? You get a `UnicodeEncodeError` exception, *encoding* fails. And the exception points directly to the `writefile.write()` call; if it was the `template.render()` call that raised the exception you'd get a deeper traceback, but the `.write()` method calls into C code so the traceback stops right there. – Martijn Pieters May 19 '14 at 14:18
@boldnik: your traceback does contradict you, btw; the source code shown in the traceback shows you did not use an `encoding` parameter when opening that file. – Martijn Pieters May 19 '14 at 14:19
@boldnik: and what line is producing the exception? I see you also open the template for reading without specifying an encoding. Make sure that the file is correctly encoded there too. It is not the issue here I don't think but something to keep in mind. – Martijn Pieters May 19 '14 at 14:38
`writefile.write(rendered_template)` – boldnik May 19 '14 at 17:40
@boldnik: so it is *still* the writing that fails. What is `writefile.encoding`? – Martijn Pieters May 19 '14 at 17:43
@boldnik: curious; so the `template.render(**alldata)` produces a Unicode string without a problem, `writefile.encoding` is definitely UTF-8 and you *still* get your `UnicodeEncodeError` **in the `cp1252.py` file**? – Martijn Pieters May 20 '14 at 09:39
Well... Yes. And I scratch my head on how is it happening. – boldnik May 20 '14 at 18:17
@boldnik: can you post the current traceback to a gist or pastie or something? – Martijn Pieters May 20 '14 at 18:18
Your iPython traceback had more context, and that traceback shows you didn't separate the `template.render(**values)` call from the `writefile.write()` call either. I cannot from that see how `writefile` was opened. – Martijn Pieters May 21 '14 at 09:21

Myanmar characters encoding in Python 3.4

1 Answers1