Error (unicode error) 'utf-8' codec can't decode byte - code = compile(f.read(), fname, 'exec')

Question

I'm new with python. I'm trying to run this code:

llaves=("España","Francia","Inglaterra")
dicPaises={llaves[0]:"Madrid",llaves[1]:"Paris",llaves[2]:"Londres"}
print(dicPaises)

the result is the following error:

Traceback (most recent call last):
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "c:\program files\microsoft visual studio\2022\community\common7\ide\extensions\microsoft\python\core\debugpy\__main__.py", line 45, in <module>
    cli.main()
  File "c:\program files\microsoft visual studio\2022\community\common7\ide\extensions\microsoft\python\core\debugpy/..\debugpy\server\cli.py", line 444, in main
    run()
  File "c:\program files\microsoft visual studio\2022\community\common7\ide\extensions\microsoft\python\core\debugpy/..\debugpy\server\cli.py", line 285, in run_file
The thread 0x1 has exited with code 0 (0x0).
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\runpy.py", line 267, in run_path
    code, fname = _get_code_from_file(run_name, path_name)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\lib\runpy.py", line 242, in _get_code_from_file
    code = compile(f.read(), fname, 'exec')
  File "C:\Users\JANSIR\source\repos\Pruebas\Pruebas.py", line 8
    llaves=("España","Francia","Inglaterra")
                    ^
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xf1 in position 4: invalid continuation byte
The program 'python.exe' has exited with code 1 (0x1).

I'm using visual studio and python 3.9. I already tried with # -- coding: utf-8 -- But it does not work

I can run your code without problem. Did you type it with VS, or did you copy/paste it? — Thierry Lathuille, Jul 26 '22 at 14:24
I type it with VS. it's really strange. Is it possible a problem with my visual studio?. And how could I solve it? — Jhancir Poveda, Jul 26 '22 at 14:58
Follow [UTF-8 Everywhere](https://utf8everywhere.org/) and save script with `UTF-8` encoding. — JosefZ, Jul 26 '22 at 16:11

whatf0xx · Answer 1 · 2022-07-26T17:42:16.773

1

By the looks of things, the compiler is having problem with the 'ñ', because that's the character that it would probably struggle with, and it's at position 4, which is the one highlighted by the error message. See this answer for a similar solution:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 2: invalid continuation byte

It seems like your encoding has got mixed up. \xf1 - the byte encoding raised in the error - corresponds to 'ñ' with latin1 encoding:

>>>"España".encode("latin1")
b'Espa\xf1a'
>>>b'\xf1'.decode('latin1')
'ñ'

Whereas utf-8 wants to encode the ñ as a combination of a '~' and an 'n':

>>>b'\xc3\xb1a'.decode('utf-8')
'ñ'

As @JosefZ points out, you should use utf-8 encoding always, and instead tell VS Code to use this instead of latin1. I think you can find the options you need here:

https://learn.microsoft.com/en-us/powershell/scripting/dev-cross-plat/vscode/understanding-file-encoding?view=powershell-7.2

if you ctrl+f for "Configuring VS Code" you should find the info you need.

edited Jul 26 '22 at 17:42

answered Jul 26 '22 at 15:08

whatf0xx

108
7

Yes, using a comment `coding: latin1` at the top of the code should work. – Mark Ransom Jul 26 '22 at 15:22
No, no, no! Stay at [UTF-8 Everywhere](https://utf8everywhere.org/) and save script with `UTF-8` encoding. – JosefZ Jul 26 '22 at 16:10
@JosefZ good advice, but Windows makes it difficult sometimes. The key point is that the file encoding must match what is specified by the coding comment. – Mark Ransom Jul 26 '22 at 19:19
This answer is incorrect in one minute detail - the utf-8 representation is not two characters `~` and `n`, it's still a single character but it takes two bytes. – Mark Ransom Jul 26 '22 at 19:23
@JosefZ thanks, I updated the answer to reflect this. – whatf0xx Jul 26 '22 at 20:59
The utf-8 representation for character `ñ` (U+00F1, *Latin Small Letter N With Tilde*) is byte sequence `\xc3\xb1`. Interpreted as [mojibake](https://en.wikipedia.org/wiki/Mojibake): `Ã` (U+00C3, *Latin Capital Letter A With Tilde*) and `±` (U+00B1, *Plus-Minus Sign*). Characters `n` (U+006E, *Latin Small Letter N*) and `̃` (U+0303, *Combining Tilde*) are result of [Unicode Normalization](https://www.unicode.org/reports/tr15/), forms `NFD` or `NFKD`. – JosefZ Jul 27 '22 at 18:12

score -1 · Answer 2 · answered Jul 27 '22 at 14:24

-1

Thanks to everyone for your replies. I fixed it finally by changing the local settings in language.

answered Jul 27 '22 at 14:24

Jhancir Poveda

9
1
2

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 28 '22 at 11:52

Error (unicode error) 'utf-8' codec can't decode byte - code = compile(f.read(), fname, 'exec')

2 Answers2