4

I've a string that I'm compressing using zlib, storing it in a dictionary and creating a md5 hash of the dictionary. But I'm getting the error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c in position 1: invalid start byte

The code is:

data['extras'] = zlib.compress("My string".encode("utf-8"))  //The string is very large that\'s why it\'s needed to be compressed to save up memory 
checkup['hash'] = hashlib.md5(json.dumps(dict(data), sort_keys=True)).hexdigest()

The dictionary is something like:

{'extras':'x\x9cK\x04\x00\x00b\x00b'}

Can anyone tell me how I can I dump this dictionary/string in JSON ?

The string is a long json. Something like:

{
    "listing": {
            "policies": null,
            "policy_explanation": "Some Text",
            "policy_name": "Flexi3",
            "updated": "7 weeks ago",
            "city": "Bengaluru",
            "country": "India",
             .
             .
             .   
}
Rahul
  • 3,208
  • 8
  • 38
  • 68
  • Have you tried putting the utf8 header? are you using py2? – taesu Dec 17 '15 at 20:29
  • Yup. Same result. The error is in this line as per the logs `checkup['hash'] = hashlib.md5(json.dumps(dict(data), sort_keys=True)).hexdigest()` – Rahul Dec 17 '15 at 20:32
  • py2 str is such a headache. What's the character in question? can you post part of the str – taesu Dec 17 '15 at 20:34
  • Well it worked after adding `base64.b64encode` after compressing as suggested by @Turn. But the length of the string increased by almost 30 %. 12272:9204. Is there any alternate soltn so that it doesn't eat up extra space ? – Rahul Dec 17 '15 at 20:39
  • 1
    JSON doesn't support binary data so something like `base64` is about the best you can do... unless... you decide not to compress the string inside your dict, but instead compress the whole JSON string once its been dumped. If you are sending it on the web, for instance, you'd gzip encode the entire json payload. – tdelaney Dec 17 '15 at 20:44
  • Thanks for the suggestion @tdelaney . That clears all the doubts. Looks like I've to use `base64` only, as I need to compress only the `extras` field. – Rahul Dec 17 '15 at 20:53

1 Answers1

8

You could first base64 encode it to get this to work. It will add some size to the string but probably less than you saved by compressing it first:

data['extras'] = base64.b64encode(zlib.compress("My string".encode("utf-8")))
Turn
  • 6,656
  • 32
  • 41