-1

So this is an odd question but I'm trying to process Bengali characters like খ ( I tried with Arabic و and Japanese 片 as well as well) on VS Code and all was going well until suddenly I got this error:

SyntaxError: Non-UTF-8 code starting with '\xe0' in file ..., but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Note: When using arabic character و and japanese character , I got similar errors but with different notation - "\xd9" and "\xe7" respectively.

My code is not the problem because it's simply text = [long foreign language text] and that itself gives me an error. However, I noticed, through some experimenting, that this was only producing an error if I exceeded 167 foreign language characters (for Japanese as well, but for arabic the threshold was higher).

To find that limit, I created a string (without spaces) of only খ and kept incrementing the number of characters till I got the error. At 167 characters (as per this character count website), everything worked fine. But as I added another character (total 168 characters), the above error was thrown.

The common answers to this question in other stackoverflow posts such as this and this don't seem to work for me. That is likely because this doesn't really sound like an encoding problem. If it was an encoding problem, it should have thrown an error regardless of the length of the string right?

I tried to replicate this in the Spyder IDE and it doesn't seem to have any such problems or limits. That leads me to believe this is a VS Code problem. Is anyone familiar with such issues or knows how to solve them in VS Code?

I like working in VS Code so I'd rather not have to change just for this.

My whole code if it matters:

# (167 Characters) Gives no error in VS Code
text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"

print(text)


# (168 Characters) Gives error in VS Code but not in Spyder IDE
text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"

print(text)

The traceback, incase it matters is: File "filename.py", line 5 SyntaxError: Non-UTF-8 code starting with '\xe0' in file filename.py on line 16, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Edit Tried with # coding: utf-8 in front but still caused an issue on my vscode.

Tried with utf-8

codingray
  • 106
  • 7
  • always put full error message (starting at word "Traceback") in question (not comment) as text (not screenshot, not link to external portal). There are other useful information. – furas Aug 20 '21 at 10:17
  • do you run code with the same Python? Maybe one of them runs with `Python 2`. Or maybe one IDE uses encoding `cp1252` and it makes problem. Check if you can run it directly in console (without IDE) - `python script.py` – furas Aug 20 '21 at 10:19
  • Could you please spell out which character set exactly you use in the file? Are all the characters valid UTF-8 according to other tools? – tripleee Aug 20 '21 at 19:11
  • @snakecharmerb Looked through and played around with anything that may relate to line wrapping but no dice sadly. Didn't seem to find much anyway – codingray Aug 24 '21 at 03:43
  • @snakecharmerb @furas The traceback wasn't particularly useful as compared to what I already wrote I felt. So I didn't include. It is `File "filename.py", line 5 SyntaxError: Non-UTF-8 code starting with '\xe0' in file filename.py on line 16, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details`. Note that this error doesn't show up if I use 1 character less than 168 (for bengali and japanese). And yes, I only have one version of Python on my system at the moment. – codingray Aug 24 '21 at 03:46
  • @tripleee yes, they are valid on spyder with the same python version but not on VS Code. I am not sure what you mean by character set but since they work perfectly on Spyder IDE, I assume it should be valid for other tools as well. – codingray Aug 24 '21 at 03:48
  • 1
    The [Stack Overflow `character-encoding` tag info page](http://stackoverflow.com/tags/character-encoding/info) has a brief intro to the topic. If a single tool accepts these characters, I would blame that tool at least until I can get a third opinion. If you are on Windows, its native tools are regrettably (but unsurprisingly) untrustworthy, but still, what does e.g. Notepad think about the file? Or if you are on a real computer, try `iconv -t utf16-le file >/dev/null`; if it works, the file is valid UTF-8 throughout. – tripleee Aug 24 '21 at 04:16
  • @tripleee It's not really a file. It's just a string placed directly into the program as shown in the code snippet above. My other tools, both on windows and my real computer (linux) seem to be able to process that string (and other similar strings) without issue. I really think it's just a vs code issue so I've been using others for the timebeing. – codingray Aug 24 '21 at 04:38
  • Your source code needs to be a file in order for Python to execute it, as also witnessed by the `filename.py` in the traceback. – tripleee Aug 24 '21 at 04:47
  • 1
    But yeah, sounds like you found a bug in VS Code. – tripleee Aug 24 '21 at 04:47
  • Yes, it seems to be a bug of VSCode. Even in 2022 June, this problem still occurs if you don't add something like `# coding=utf8` at the top of the code. Seems like it depends on the length of the string. This error occurs even if the comment contains some long length of foreign characters. – starriet Jun 25 '22 at 02:17

2 Answers2

2

Could you try to add this at the beginning of the file:

# coding:utf-8

Update:enter image description here

enter image description here

Update:

It seems like the length of the character and even the variable name can cause the problem of Non-UTF-8 code starting with '\xe0' in xxx on line xxx, but no encoding declared;

It's confusing, I will get the error of Non-UTF-8 code starting with '\xe0' with these codes:

text2 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"

print(text2)

text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"

print(text)

text3 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"

print(text3)

While this works, as I only change text to text5, without change anything others:

text2 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"

print(text2)

text5 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"

print(text5)

text3 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"

print(text3)

This does not work too:

text2 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"

print(text2)

But if I only add some lines, it will work:

text2 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"

print(text2)


text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"

print(text)

And this does not work too:

text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"

print(text)

text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"

print(text)

All of the problems have mentioned above, can be solved with # coding:utf-8 or # -*- coding: utf-8 -*-

Steven-MSFT
  • 7,438
  • 1
  • 5
  • 13
  • slfr. I tried but that doesn't help. From what I have read from other users, python3 (which is what i'm using) is default utf-8. In any case, the problem doesn't seem to be interpretation as much as it seems to be related to how many characters are being interpreted. – codingray Aug 24 '21 at 03:40
  • @codingray I have tested it locally, and it works well, you can have a look at it in my update. Could you provide a screenshot of it? – Steven-MSFT Aug 24 '21 at 06:32
  • I tried (update provided in the question) and the error seems to persist – codingray Aug 24 '21 at 06:49
  • @codingray Could have a try with this: `# -*- coding: utf-8 -*-` instead of `# coding:utf-8`? – Steven-MSFT Aug 24 '21 at 07:15
  • Oh wow ok it works with `# -*- coding: utf-8 -*-`! Thank you so much! Is there a reason why? – codingray Aug 24 '21 at 10:08
  • 1
    @codingray Could you check the update in the answer? I can not find out the reason for it, it's confusing. – Steven-MSFT Aug 25 '21 at 02:41
  • Seems like a bug of VSCode. This occurs even when the string is inside of a comment. – starriet Jun 25 '22 at 02:20
-1

I was able to make your error go away by specifing the encoding at the top of the file. Specifically, add this line to the top of your file:

# -*- coding: cp1252 -*-

By default python will use ascii as the standard encoding, but this line changes the encoding to cp1252. The cp1252 encoding standard is used for many European languages including Arabic. It looks like the default encoding for Japanese characters is shift-jis, but I have not tried this.

Luke Darcy
  • 43
  • 7