-1

Here is the code:

s = 'Waitematā'
w = open('test.txt','w')
w.write(s)
w.close()

I get the following error.

UnicodeEncodeError: 'charmap' codec can't encode character '\u0101' in position 8: character maps to <undefined> The string will print with the macron a, ā. However, I am not able to write this to a .txt or .csv file.

Am I able to swap our the macron a, ā for no macron? Thanks for the help in advance.

2 Answers2

0

Note that if you open a file with open('text.txt', 'w') and write a string to it, you are not writing a string to a file, but writing the encoded string into the file. What encoding used depends on your LANG environment variable or other factors.

To force UTF-8, as you suggested in title, you can try this:

w = open('text.txt', 'wb') # note for binary
w.write(s.encode('utf-8')) # convert str into byte explicitly
w.close()
adrtam
  • 6,991
  • 2
  • 12
  • 27
0

As documented in open:

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.

Not all encodings support all Unicode characters. Since the encoding is platform dependent when not specified, it is better and more portable to be explicit and call out the encoding when reading or writing a text file. UTF-8 supports all Unicode code points:

s = 'Waitematā'
with open('text.txt','w',encoding='utf8') as w:
   w.write(s)
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251