7

I am trying to work with the HORRIBLE web services at Commission Junction (CJ). I can get the client to connect and receive information from CJ, but their database seems to include a bunch of bad characters that cause a UnicideDecodeError.

Right now I am doing:

from suds.client import Client
wsdlLink = 'https://link-search.api.cj.com/wsdl/version2/linkSearchServiceV2.wsdl'
client = Client(wsdlLink)
result = client.service.searchLinks(developerKey='XXX', websiteId='XXX', promotionType='coupon')

This works fine until I hit a record that has something like 'CorpNet® 10% Off Any Service' then the ® causes it to break and I get

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 758: ordinal not in range(128)" error.

Is there a way to encode the ® on my end so that it does not break when SUDS reads in the result?

UPDATE: To clarify, the ® is coming from the CJ database and is in their response. SO somehow I need to decode the non-ascii characters BEFORE SUDS deals with the response. I am not sure how (or if) this is done in SUDs.

bluish
  • 26,356
  • 27
  • 122
  • 180
chris
  • 825
  • 2
  • 9
  • 22
  • make sure that you don't mix `str` and `unicode` objects e.g., `u'a'+'®'` will cause the error. Decode input to Unicode as earlier as possible. – jfs Jan 16 '11 at 04:49

3 Answers3

3

Implicit UnicodeDecodeErrors is something you get when trying to add str and unicode objects. Python will then try to decode the str into unicode, but using the ASCII encoding. If your str then contains anything that is not ascii, you will get this error.

Your solution is the decode it manually like so:

thestring = thestring.decode('utf8')

Try, as much as possible, to decode any string that may contain non-ascii characters as soo as you are handed it from whatever module you get it from, in this case suds.

Then, if suds can't handle Unicode (which may be the case) make sure you encode it back just before handing the text back to suds (or any other library that breaks if you give it unicode).

That should solve things nicely. It may be a big change, as you need to move all your internal processing from str to unicode, but it's worth it. :)

bluish
  • 26,356
  • 27
  • 122
  • 180
Lennart Regebro
  • 167,292
  • 41
  • 224
  • 251
  • Lennart. The issue is the non-ascii characters are actually in CJ database and in the response they send. So not sure how I can decode their response before SUDS tries to parse it and throws the error. I need someway to send the request, decode the response and then parse the response. But I do not see a way to do this in SUDs. – chris Jan 16 '11 at 08:57
  • @chris: So the error happens in suds, even before your code ever handles the data? In that case it's a bug, either in suds or in the server. Perhaps the server sends data encoded in UTF when it claims it's something else? – Lennart Regebro Jan 16 '11 at 09:05
  • Lennart - Correct. I am pretty sure it is happening at the server (which I can not control). Commission Junction do not seem to support the web services and I was hoping there was someway to correct the data before it gets fed back into SUDs. I was thinking it was a long shot, but thought I was maybe missing something. – chris Jan 16 '11 at 09:10
  • 1
    @chris: Come to think if it, since it uses the ascii decoder when it fails, I think it's more likely to be a suds bug. You'll have to check on a suds mailing list. – Lennart Regebro Jan 16 '11 at 09:16
  • @Lennart you are 100% correct. Last night I started digging around in SUDS and was able to patch it so that everything works now. Thanks very much for your help. – chris Jan 16 '11 at 18:03
  • 1
    @chris: Please, report the bug and submit your patch at https://fedorahosted.org/suds/ – jfs Jan 16 '11 at 18:18
  • @chris, would you please tell me what patch did you use to solve this porblem, I'm having almost similar issue here with unicode http://stackoverflow.com/questions/15339141/suds-0-4-cant-handle-unicode-xml-sax-exceptions-saxparseexception-unknown . Thanks in advance! – securecurve Mar 14 '13 at 07:35
1

The "registered" character is U+00AE and is encoded as "\xc2\xae" in UTF-8. It looks like you have a str object encoded in UTF-8 but some code is doing (probably by default) your_str_object.decode("ascii") which will fail with the error message you showed.

What you need to do is show us a complete example (i.e. ALL the code necessary to get the error), plus the full error message and traceback, so that at least we can guess whether the problem is in your code or in imported code.

John Machin
  • 81,303
  • 11
  • 141
  • 189
  • To be clear. The data that is causing the error is in the reply back from the web service. So, I send a request that works, but the issue happens when CJ replies back with a "registered" character in the response. So what I need to do is somehow clean the character BEFORE SUDS tries to parse it. As far as the code, what you see above is all you need to do with SUDS to get a web service response and SUDs error. – chris Jan 16 '11 at 08:50
  • @chris: All **you** need to do is what everybody should do when asking about a problem that is raising an exception: run the minimal code necessary to cause the problem, and copy/paste the full error message and traceback into an edit of your question. By the way, how do you know that it's a "registered" character in the response? – John Machin Jan 16 '11 at 09:52
  • John - the code I ran is what I put in the question, and the error is "UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 759: ordinal not in range(128)". Not sure what you are asking for beyond that. I know it is the "registered" character because I can use another soap client that does not break on the ascii characters and see that on the results that break the "registered" character is included in the response. When the results do not include the "registered" character, the results come back correct. The issue ended up being in SUDs, I did a ghetto patch. Thanks – chris Jan 16 '11 at 18:01
0

I am using SUDS to interface with Salesforce via their SOAP API. I ran into the same situation until I followed @J.F.Sabastian's advice by not mixing str and unicode string types. For example, passing a SOQL string like this does work with SUDS 0.3.9:

qstr = u"select Id, FirstName, LastName from Contact where FirstName='%s' and LastName='%s'"  % (u'Jorge', u'López')

I did not seem to need to do str.decode("utf-8") either.

If you're running your script from PyDev on Eclipse, you might want to go into Project => Properties and under Resource, set "Text File Encoding" to UTF-8, on my Mac, this defaults to "MacRoman". I suppose on Windoze, the default is either Cp1252 or ISO-8859-1 (Latin). You could also set this in your Workspace of your Projects inherit this setting from their workspace. This only effects the program source code.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Chris Wolf
  • 1,539
  • 2
  • 10
  • 9