0

I'm working with Python 2.7 in a script that would allow me to separate words in chinese sentences (in which there are no spaces between words). I have many problems here that I guess are related to the encoding:

  • If I try to do this simple command on a script it works just fine, but on the shell I get:

    >>> sentence= '我每天学习'
    Unsupported characters in input
    
  • For some sort of reason, whenever I remove characters from the end to the begining, when there's only a character left ('我') the character I get in its stead is ' 我 '.

The loop I'm using to shorten the sentence taking the last character each time would be this:

    for i in range(num_characters/3):
       temp= sentence[:num_characters-i*3]      

where num_characters would be the number of characters times 3; and temp would be the new sentence I'm analyzing.

I'm using UTF-8 coding in the script and in theory IDLE is using UTF-8 as well, so I'm kind of lost. Any kind of help would be appreciated.

  • Possible duplicate of [Unsupported characters in input In Python IDLE](http://stackoverflow.com/questions/20596045/unsupported-characters-in-input-in-python-idle) – DJanssens Nov 28 '15 at 12:59
  • Why the for loop? Maybe a real example would be more helpful. – lord63. j Nov 28 '15 at 13:30
  • The for loop just goes through the sentence taking a character less each time so it tests if there's a match in the dictionary with the set of characters of that iteration.. – user1673162 Dec 05 '15 at 19:02

0 Answers0