1

I've searched through so many pages to try and help myself that I'm now more confused on python2 and unicode than I was before I started.

What I'm trying to achieve:

Using the google content api v2 for python, I've written an implementation that will take products from our database and post them to Google.

This works fine until I get to some products which have unicode characters in them.

An example and the errors returned from google/python are:

D' Addario EXP11 Coated Bronze Acoustic Guitar Strings, 12-53 
Fender Stop Dreaming, Start Playing™ Affinity P Bass® With Rumble™ 15 

ERROR'utf8' codec can't decode byte 0x92 in position 1: invalid start byte
ERROR'utf8' codec can't decode byte 0x99 in position 35: invalid start byte

I know its the ' ® ™ characters but I can't work out the .encode / .decode etc. aspect of it.

So, can anyone tell me how I can take these product names with special characters in them so that I can post them to Google?

== update == I'm getting the product names from a MySQL db. The table is set to use UTF-8 as the encoding.

Alex Hellier
  • 435
  • 1
  • 7
  • 15

2 Answers2

2

try:

u'Addario EXP11 Coated Bronze Acoustic Guitar Strings, 12-53 
Fender Stop Dreaming, Start Playing™ Affinity P Bass® With Rumble™ 15'

or

unicode('Addario EXP11 Coated Bronze Acoustic Guitar Strings, 12-53 
Fender Stop Dreaming, Start Playing™ Affinity P Bass® With Rumble™ 15')

But that aside. Unicode support in Python 2 is a pain in the ass a lot of times. I recommend trying Python 3 where unicode is standard.

Yurippenet
  • 234
  • 1
  • 7
  • I tried the unicode() approach and that didn't seem to work, I seem to remember getting other errors. – Alex Hellier Jun 05 '15 at 15:09
  • I'll try and dig out the issues when I post via the unicode() method, I was attempting to convert it to python3 as well! – Alex Hellier Jun 05 '15 at 15:09
  • >>> mytext = unicode('Fender Stop Dreaming, Start Playing™ Affinity P Bass® With Rumble™ 15') Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 35: ordinal not in range(128) – Alex Hellier Jun 05 '15 at 15:13
1

Python 3 is, the answer :) (now google support it with their sdk)

Alex Hellier
  • 435
  • 1
  • 7
  • 15