Hebrew appears as gibberish, DB importing with PyPyODBC

Question

I'm trying to work with a Hebrew database, unfortunately the output is gibberish. What am I doing wrong?

# -*- coding: utf-8 -*-
import pypyodbc 
conn = pypyodbc.connect('Driver={Microsoft Access Driver (*.mdb)};DBQ=C:\\client.mdb')
cur = conn.cursor()
cur.execute('''SELECT * FROM Client''')
d = cur.fetchone()
for field in d:
    print field

If I look at cur.fetchone():

'\xf0\xf1\xe0\xf8', '\xe0\xe9\xe0\xe3'

Output:

αΘαπ
2001
εδßΘ
αΘ°σ

I'm not too sure about Unicode encodings, but it looks like it might have encoded it in something other that UTF-8 or that there's some kind of offset between fields and unicode strings. `\xf0` is the start of a 4-byte UTF-8 string, but Hebrew characters should all be 2-byte and have a binary representation starting with `1100xxxx`. — Kyle_S-C, Mar 14 '15 at 00:10
Might it be in [Windows 1255 encoding](https://msdn.microsoft.com/en-gb/goglobal/cc305148)? — Kyle_S-C, Mar 14 '15 at 00:14

score 2 · Answer 1 · answered Mar 14 '15 at 00:19

2

If either of נסאר or איאד is meaningful, then try:

field.decode('cp1255')

Google Translate suggests this might correspond to a person named Iyad Nassar.

answered Mar 14 '15 at 00:19

Kyle_S-C

1,107
1
14
31

it does. I really don't understand why it works on your machine, but on mine I get "UnicodeEncodeError: 'ascii' codec can't encode characters in position..." – RoyEsh Mar 14 '15 at 00:24
I'm using PyCharm IDE to represent things for me, with `# coding: utf-8` at the top, like you. It's definitely encoded in the Windows 1255 encoding then. It's a bit of a pain, but each hex number corresponds to a single Hebrew character or vowel mark. I also have Hebrew installed as a language on Windows so that I could help my partner with a Hebrew language corpus study. – Kyle_S-C Mar 14 '15 at 00:28
Perhaps this might help: https://pythonhosted.org/kitchen/unicode-frustrations.html – Kyle_S-C Mar 14 '15 at 00:42

score 0 · Answer 2 · answered Mar 13 '15 at 21:49

0

try use:

field.encode('utf-8')

answered Mar 13 '15 at 21:49

Roy Shmuli

4,979
1
24
38

Hebrew appears as gibberish, DB importing with PyPyODBC

2 Answers2