0

I tried to implement this code from NLP UPC research group to retrieve synonyms for some entered words. when I ran the testing method

def test():
    "tests some functions"
    a=wn.get_words(True)
    print  'length of a: ', len(a)
    print 'a[0]: ', a[0].tostring().decode('utf-8')

the output is unknowing encoding

length of a:  16043
a[0]:  �����

in the same code the Unicode is already declared as

def _encode(data):
    return data.encode('utf8')

and the platform that I used (net beans 7.2.1)is configured to support utf-8 encoding

how to solve this problem?

ollo
  • 24,797
  • 14
  • 106
  • 155
Abrial
  • 421
  • 1
  • 5
  • 20
  • 2
    Use `repr(a[0].tostring())` instead of `a[0].tostring().decode('utf-8')` and see what gets returned. – Blender Jan 04 '13 at 12:28
  • thank you for your suggestion, but still have the same problem :( . the output is like this: Traceback (most recent call last): File "AWN.py", line 402, in test print 'a[0]: ', repr(a[0].tostring()) AttributeError: 'unicode' object has no attribute 'tostring' – Abrial Jan 04 '13 at 16:30

2 Answers2

1

If you already configured your setup to handle UTF-8, you do not need to decode your string to a Unicode object. What will happen then is that Python uses the current encoding detected for sys.stdout.

Try not decoding:

print 'a[0]: ', a[0].tostring()
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
0

thank you for the answers. I used this command instead and it's worked with me

print 'a[0]: ', a[0].encode('utf-8')
Abrial
  • 421
  • 1
  • 5
  • 20