Issue with unicode and chinese characters in Python

Question

I'm working with Python 2.7 in a script that would allow me to separate words in chinese sentences (in which there are no spaces between words). I have many problems here that I guess are related to the encoding:

If I try to do this simple command on a script it works just fine, but on the shell I get:
```
>>> sentence= '我每天学习'
Unsupported characters in input
```
For some sort of reason, whenever I remove characters from the end to the begining, when there's only a character left ('我') the character I get in its stead is ' æˆ‘ '.

The loop I'm using to shorten the sentence taking the last character each time would be this:

    for i in range(num_characters/3):
       temp= sentence[:num_characters-i*3]

where num_characters would be the number of characters times 3; and temp would be the new sentence I'm analyzing.

I'm using UTF-8 coding in the script and in theory IDLE is using UTF-8 as well, so I'm kind of lost. Any kind of help would be appreciated.

Possible duplicate of [Unsupported characters in input In Python IDLE](http://stackoverflow.com/questions/20596045/unsupported-characters-in-input-in-python-idle) — DJanssens, Nov 28 '15 at 12:59
Why the for loop? Maybe a real example would be more helpful. — lord63. j, Nov 28 '15 at 13:30
The for loop just goes through the sentence taking a character less each time so it tests if there's a match in the dictionary with the set of characters of that iteration.. — user1673162, Dec 05 '15 at 19:02

Issue with unicode and chinese characters in Python

0 Answers0