4

I am using the awesome Requests module to test an API I've created for one of our internal projects. I believe I have discovered what is either a flaw in the Requests module itself, or a flaw in my usage of it.

Because our data is not super sensitive, our API uses simple, basic HTTP authentication to control acces. When I make requests of the API URL, using JSON as the data format and either urllib2 with HTTPBasicAuthHandler or PHP and cURL, I get my data back as a properly formatted JSON string - no problem.

However, when I make the same request using the Requests module, I get back an encoded string, and I cannot determine what type of encoding it is. Here is a snippet of the beginning of that string:

\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xadZ\xfb\x8f\xd3H\x12\xfeWzG\xab;\x90

Here are the few lines of code I am using with Requests to reproduce this issue:

import requests
# api_user and api_pw not printed here for security reasons
r = requests.get('http://ourdomain.com/api/featured/school/json', auth=(api_user, api_pw))
status = r.status_code # Produces 200 every time
rawdata = r.read()
print rawdata

And I get that encoded string each time I do that.

Can anyone help me to determine: a) What encoding that is (for my own edification), and b) Why Requests is returning data in that encoding, and how to decode and/or "fix" it.

Thanks in advance!

Piotr Dobrogost
  • 41,292
  • 40
  • 236
  • 366
tommytwoeyes
  • 483
  • 5
  • 19

1 Answers1

6

Out of curiosity, what do you get when you print r.content ?

Hank Gay
  • 70,339
  • 36
  • 160
  • 222
  • That's interesting! I didn't even see that method when I did: dir(r) That outputs the JSON string. Is that the method that should be called, instead of read()? – tommytwoeyes Aug 09 '11 at 16:12
  • Upon further reflection, I can see how that might be a misuse on my end of the library (i.e. - should have called r.content instead of r.read()), but it doesn't explain why the output is different between my development virtual machine (all other factors being the same, outputs the JSON string when calling r.read()) and the production box (which outputs that encoded string). Any ideas why the output is different? – tommytwoeyes Aug 09 '11 at 16:16
  • @waveslider I don't know anything about requests other than that it's on my list of things to look into, but at a guess I'd say it has to do with default encodings. Your dev box is probably UTF-8 (which all JSON is supposed to be) and the server is something else. I'm guessing the `.content` property is looking at all the encoding headers, etc. and applying them, while `.read()` is just pulling the bytes off the wire, and since it's encoded differently, you get the bytes. Again, all of that is just guessing. – Hank Gay Aug 09 '11 at 17:37
  • Thanks, man! That sounds like a pretty good guesstimate to me. How would I go about determining the default encoding of my production box (CentOS 5)? Is that a Python configuration variable or a configuration of the OS? – tommytwoeyes Aug 09 '11 at 17:42
  • 1
    Your OS does have a default encoding, but I don't know exactly how Python interacts with that. I'm almost positive there's a way to override it, but I don't know it off the top of my head. It might help to read [the Unicode HOWTO](http://docs.python.org/howto/unicode.html). The best solution is probably to use [`.content`](http://readthedocs.org/docs/requests/en/v0.5.0/api/#requests.models.Response.content), since that is working and is the way the example code works. – Hank Gay Aug 09 '11 at 17:55
  • Thank you - you're right, so I'll just use .content. It's easier. – tommytwoeyes Aug 09 '11 at 19:15
  • 2
    Yes, Python does get the default encoding from the system. It depends on the Python version and the platform and configuration. Here's a great resource for in-depth information: http://farmdev.com/talks/unicode/ – Kenneth Reitz Aug 11 '11 at 04:21