2

I am trying to serialize a code and send it as a json...

def f(x): return x*x
def fi(x): return int(x[0])

code_string = marshal.dumps(fi.func_code)

jsn = {"code":code_string)
json.dumps(jsn) # doesnt work if code_string is from fi

So... the above code block works if my function is f(x)

But fails for fi(x)

Original exception was:

Traceback (most recent call last):
  File "/home/mohitdee/Documents/python_scala/rdd.py", line 41, in <module>
    send_data(json.dumps(jsn))
  File "/usr/lib/python2.7/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x83 in position 32: invalid start byte
[48001 refs]

How do i resolve this in python

norok2
  • 25,683
  • 4
  • 73
  • 99
frazman
  • 32,081
  • 75
  • 184
  • 269
  • Do you *have* to use json? – msvalkon Mar 24 '14 at 20:11
  • The weird way that this problem is reported is only possible because of Python 2.x's broken string handling, wherein `str` and `bytes` are the same thing and there are sometimes implicit decode/encode steps. In 3.x, we get a much more obvious error message: `TypeError: Object of type bytes is not JSON serializable` (also, to get the actual code `bytes`, we need `.__code__.co_code`, rather than `.func_code`). – Karl Knechtel Sep 08 '22 at 07:40

4 Answers4

6

Marshall is a binary protocol, i.e. a bunch of bytes with very custom interpretation. It's not text, it doesn't conform to in any particular text encoding. It is, for the most part, just a sequence of bits. If you absolutely need to embed those in a text protocol like JSON, you need to escape the bytes that don't make valid characters in the relevant encoding (to be safe, assume a subset of ASCII). The canonical solution is base64:

import base64

code_string = marshal.dumps(fi.func_code)
code_base64 = base64.b64encode(code_string)

jsn = {"code": code_base64}
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • In 3.x, the result from `base64.b64encode` is still a `bytes` object, which still cannot be serialized. However, it will be guaranteed to use only bytes from the ASCII range, so it can now safely be `.encode('ASCII')`d. – Karl Knechtel Sep 08 '22 at 07:47
3

You can serialize all kinds of live objects, including functions, using the cloud library implementation of pickle.

import cloud, pickle

def serialize(func):
    return cloud.serialization.cloudpickle.dumps(func)

def deserialize(string):
    return pickle.loads(string)
salezica
  • 74,081
  • 25
  • 105
  • 166
2

Try encoding it with base64 or some other algorithm of this sort.

Filip Malczak
  • 3,124
  • 24
  • 44
2

Use pickle (or cPickle):

The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure.

>>> import cPickle
>>> import json
>>> def fi(x):
...     return int(x[0])
... 
>>> fi(['1'])
1
>>> code_string = cPickle.dumps(fi)
>>> jsn = {"code": code_string}
>>> serialized = json.dumps(jsn)

>>> deserialized = json.loads(serialized)
>>> f = cPickle.loads(str(deserialized['code']))
>>> print f(['1'])
1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • 3
    Note that pickling a function only stores its module and name, not its code. That has some benefits, but also means the pickle is useless if the unpickler doesn't run (some version of) the same application that created the pickle. –  Mar 24 '14 at 20:18