-1

I have problem with UnicodeEncodeError in my users_information list:

{u'\u0633\u062a\u064a\u062f@nimbuzz.com': {'UserName': u'\u0633\u062a\u064a\u062f@nimbuzz.com', 'Code': 5, 'Notes': '', 'Active': 0, 'Date': '12/07/2014 14:16', 'Password': '560pL390T', 'Email': u'yuyb0y@gmail.com'}}

And I need to run this code to get users information:

def get_users_info(type, source, parameters):
    users_registertion_file = 'static/users_information.txt'
    fp = open(users_registertion_file, 'r')
    users_information = eval(fp.read())
    if parameters:
        jid = parameters+"@nimbuzz.com"
        if users_information.has_key(jid):
            reply(type, source, u"User name:\n" +str(users_information[jid]['UserName'])+ u"\nPassword:\n" +str(users_information[jid]['Password'])+ u"\nREG-code:\nP" +str(users_information[jid]['Code'])+ u"\nDate:\n" +str(users_information[jid]['Date'])+ u"\naccount status:\n " +str(users_information[jid]['Active']))
        else:
            reply(type, source, u"This user " +parameters+ u"  not in user list")
    else:
        reply(type, source, u"write the id after command")

but when I try to get users information I get this error:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

I try to unicode the jid using unicode('utf8'):

jid = parameters.encode('utf8')+"@nimbuzz.com"

but I get the same error:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

Please how I can solve this problem and as you see the UserName key in the users_information list look like:

u'\u0633\u062a\u064a\u062f@nimbuzz.com'

and the users_information list located in txt file.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Mahmoud Al Nafei
  • 349
  • 1
  • 3
  • 9

2 Answers2

0

You'll not find your user information unless jid is a unicode string. Make sure parameters is a unicode value here, and it'll be easier to use string formatting here:

jid = u"{}@nimbuzz.com".format(parameters)

If you use an encoded bytestring, Python will not find your username in the dictionary as it won't know what encoding you used for the string and won't implicitly decode or encode to make the comparisons.

Next, you cannot call str() on a Unicode value without specifying a codec:

str(users_information[jid]['UserName'])

This is guaranteed to throw an UnicodeEncodeError exception if users_information[jid]['UserName'] contains anything other than ASCII codepoints.

You need to use Unicode values throughout, leave encoding the value to the last possible moment (preferably by leaving it to a library).

You can use string formatting with unicode objects here too:

reply(type, source, 
      u"User name:\n{0[UserName]}\nPassword:\n{0[Password]}\n"
      u"REG-code:\nP{0[Code]}\nDate:\n{0[Date]}\n"
      u"account status:\n {0[Active]}".format(users_information[jid]))

This interpolates the various keys from users_information[jid] without calling str on each value.

Note that dict.has_key() has been deprecated; use the in operator to test for a key instead:

if jid in users_information:

Last but not least, don't use eval() if you can avoid it. You should use JSON here for the file format, but if you cannot influence that then at least use ast.literal_eval() on the file contents instead of eval() and limit permissible input to just Python literal syntax:

import ast

# ...

users_information = ast.literal_eval(fp.read())
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
-1

I had some problem years ago:

jid = parameters+"@nimbuzz.com"

must be

jid = parameters+u"@nimbuzz.com"

and put it at first or second row:

#coding:utf8

Example for Martijn Pieters - on my machine

Python 2.7.8 (default, Jul  1 2014, 17:30:21) 
[GCC 4.9.0 20140604 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a=u'asdf'
>>> b='ваывап'
>>> a+b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)
>>> c=u'аыиьт'
>>> a+c
u'asdf\u0430\u044b\u0438\u044c\u0442'
>>>
eri
  • 3,133
  • 1
  • 23
  • 35
  • 1
    No, concatenating Unicode with a ASCII bytestring is perfectly valid and won't lead to an encoding error. The PEP 263 comment **has no influence on implicit decoding or encoding**, please don't just Cargo Cult the use of such comments. The OP is **not** using non-ASCII source code here. – Martijn Pieters Jul 12 '14 at 10:56
  • `>>> parameters = u'\u0633\u062a\u064a\u062f'`, then `>>> parameters+"@nimbuzz.com"` gives `u'\u0633\u062a\u064a\u062f@nimbuzz.com'`, for example. – Martijn Pieters Jul 12 '14 at 10:58
  • But i cant understand how i can use ast.literal_eval() you mean i use like ast.literal_eval(fp.read()) ? – Mahmoud Al Nafei Jul 12 '14 at 11:15
  • I get this error if i use `eval()` : SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in positio n 12-14: truncated \uXXXX escape – Mahmoud Al Nafei Jul 12 '14 at 11:20
  • `NameError: global name 'ast' is not defined` – Mahmoud Al Nafei Jul 12 '14 at 11:25
  • @user3658169: note that by continuing to comment on eri's post, you are pinging them with what are essentially questions directed at me. I don't get notified of those comments, I just happened to have this page open still. Better comment on my answer or use `@Martijn` to ping me if you respond to my comments here. – Martijn Pieters Jul 12 '14 at 11:34
  • @Martijn i get the same error when i use `eval()` : `SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 12-14: truncated \uXXXX escape` and the list dosn't have any invalid data. – Mahmoud Al Nafei Jul 12 '14 at 11:39
  • @user3658169: Glad to hear! Please do read my earlier comment on accepting an answer that was helpful to you. :-) – Martijn Pieters Jul 12 '14 at 11:49
  • @MartijnPieters Python 2.7.8 (default, Jul 1 2014, 17:30:21) [GCC 4.9.0 20140604 (prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> a=u'asdf' >>> b='ваывап' >>> a+b Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128) >>> c=u'аыиьт' >>> a+c u'asdf\u0430\u044b\u0438\u044c\u0442' >>> – eri Jul 16 '14 at 10:43
  • @eri: That's because the bytestring *contains non-ASCII codepoints*. I specifically used the term **ASCII bytestring** in my comment. – Martijn Pieters Jul 16 '14 at 10:44
  • @eri: also note that the exception *differs*. You are getting a `UnicodeDecodeError`, **decoding**, the OP got a `UnicodeEncodeError`, **encoding**. – Martijn Pieters Jul 16 '14 at 10:45