209

I have a data frame that stores store name and daily sales count. I am trying to insert this to Salesforce using the Python script below.

However, I get the following error:

TypeError: Object of type 'int64' is not JSON serializable

Below, there is the view of the data frame.

Storename,Count
Store A,10
Store B,12
Store C,5

I use the following code to insert it to Salesforce.

update_list = []
for i in range(len(store)):
    update_data = {
        'name': store['entity_name'].iloc[i],
        'count__c': store['count'].iloc[i] 
    }
    update_list.append(update_data)

sf_data_cursor = sf_datapull.salesforce_login()
sf_data_cursor.bulk.Account.update(update_list)

I get the error when the last line above gets executed.

How do I fix this?

Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
dark horse
  • 3,211
  • 8
  • 19
  • 35
  • That call to `range` is suspicious. You are taking `len(store)` and wrapping that in a tuple, and then calling `range` on the tuple. If you remove one set of parentheses, does it fix the code? That is, try this: `for i in range(len(store)):`. – Tim Johns Jun 18 '18 at 20:00
  • 1
    @TimJohns A pair of parentheses around a number does not make it a tuple. `(34)` is still a number 34. But `(34,)` is a tuple. – DYZ Jun 18 '18 at 20:04
  • @DyZ Good point, I didn't realize that parentheses with a single argument are treated differently than if there are multiple arguments. – Tim Johns Jun 18 '18 at 20:13
  • 4
    @TimJohns The parens are irrelevant. `a=34,` is also a tuple – Sebastian Wozny Feb 20 '20 at 15:27
  • There is an open bug report about this issue: https://bugs.python.org/issue24313 – Mihai Capotă Jul 22 '20 at 20:59

13 Answers13

205

You can define your own encoder to solve this problem.

import json
import numpy as np

class NpEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.integer):
            return int(obj)
        if isinstance(obj, np.floating):
            return float(obj)
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        return super(NpEncoder, self).default(obj)

# Your codes .... 
json.dumps(data, cls=NpEncoder)
Tommy
  • 12,588
  • 14
  • 59
  • 110
Jie Yang
  • 2,187
  • 1
  • 9
  • 3
  • 2
    @IvanCamilitoRamirezVerdes That's exactly what I was looking for so was quite helpful, for me. I like the readability of throwing in an encoding class to the `json.dumps` function as well. – Jason R Stevens CFA May 20 '20 at 01:54
  • did not owrk for me for some reason... does this support recursive calls? – adir abargil Dec 02 '20 at 14:43
  • 4
    I also needed to add ```elif isinstance(obj, np.bool_): return bool(obj)``` – James McKeown Feb 08 '21 at 19:32
  • 3
    Ive edited this answer to remove the unnecessary "elifs" since there are returns. – Tommy Jul 31 '21 at 11:28
  • 2
    For most cases, this is quite an overkill. [The solution by Tharindu Sathischandra](https://stackoverflow.com/questions/50916422/python-typeerror-object-of-type-int64-is-not-json-serializable/66345356#66345356) is more straightforward. – Gnnr Mar 11 '22 at 07:47
  • What a shame that Python does not offer JSON Converter for any type it uses. Python feels like 2 centuries ago... :-( – Tom Apr 30 '23 at 20:55
192

json does not recognize NumPy data types. Convert the number to a Python int before serializing the object:

'count__c': int(store['count'].iloc[i])
DYZ
  • 55,249
  • 10
  • 64
  • 93
42

I'll throw in my answer to the ring as a bit more stable version of @Jie Yang's excellent solution.

My solution

numpyencoder and its repository.

from numpyencoder import NumpyEncoder

numpy_data = np.array([0, 1, 2, 3])

with open(json_file, 'w') as file:
    json.dump(numpy_data, file, indent=4, sort_keys=True,
              separators=(', ', ': '), ensure_ascii=False,
              cls=NumpyEncoder)

The breakdown

If you dig into hmallen's code in the numpyencoder/numpyencoder.py file you'll see that it's very similar to @Jie Yang's answer:


class NumpyEncoder(json.JSONEncoder):
    """ Custom encoder for numpy data types """
    def default(self, obj):
        if isinstance(obj, (np.int_, np.intc, np.intp, np.int8,
                            np.int16, np.int32, np.int64, np.uint8,
                            np.uint16, np.uint32, np.uint64)):

            return int(obj)

        elif isinstance(obj, (np.float_, np.float16, np.float32, np.float64)):
            return float(obj)

        elif isinstance(obj, (np.complex_, np.complex64, np.complex128)):
            return {'real': obj.real, 'imag': obj.imag}

        elif isinstance(obj, (np.ndarray,)):
            return obj.tolist()

        elif isinstance(obj, (np.bool_)):
            return bool(obj)

        elif isinstance(obj, (np.void)): 
            return None

        return json.JSONEncoder.default(self, obj)
Jason R Stevens CFA
  • 2,232
  • 1
  • 22
  • 28
  • 1
    Not working in my case. I have this error: TypeError: keys must be str, int, float, bool or None, not bool_ – Amarpreet Singh Jun 13 '22 at 17:31
  • 1
    Nice solution, but doesn't work with complex arrays for me. – Andrew Sep 07 '22 at 11:53
  • @Andrew What specific type of complex array are you trying to map and to what types would you like it to map? If you add your own `elif` statement for complex arrays it should work! – Jason R Stevens CFA Sep 11 '22 at 18:33
  • @Jason R Stevens CFA: e.g.: `np.array([1 + 1j, 2 + 2j])`. Sure, an `elif` would work, too. I'll added a more generic method for numpy here in the answers, so you don't have to add the various numpy types manually. https://stackoverflow.com/a/73634275/15330539 – Andrew Sep 11 '22 at 21:35
32

A very simple numpy encoder can achieve similar results more generically.

Note this uses the np.generic class (which most np classes inherit from) and uses the a.item() method.

If the object to encode is not a numpy instance, then the json serializer will continue as normal. This is ideal for dictionaries with some numpy objects and some other class objects.

import json
import numpy as np

def np_encoder(object):
    if isinstance(object, np.generic):
        return object.item()

json.dumps(obj, default=np_encoder)
conmak
  • 1,200
  • 10
  • 13
  • 3
    Short and concise. – Vasantha Ganesh Apr 21 '21 at 07:53
  • This didn't work when I had a 0-dim array like `np.array(1)`. Simple fix: `if isinstance(object, (np.generic, np.ndarray))` – Michael May 12 '21 at 11:39
  • 1
    Shouldn't this raise `TypeError` if the condition is false? Or else it implicitly returns `None` which `json` will map to `null`. – Nathaniel Verhaaren Oct 20 '21 at 20:03
  • 1
    Since this is treated as a json dumps default, if nothing is returned, the encoder proceeds as normal. In this case we specifically only attempt to serialize numpy objects in this way. Non numpy objects are serialized using the normal json dumps process. This is a great way to serialize nested dictionaries of mixed object types that may include numpy objects. – conmak Nov 07 '21 at 14:30
  • 2
    This is more generic and short solution and worked for me. – Syed Muhammad Asad Jun 29 '22 at 07:30
  • I agree with @NathanielVerhaaren. This solution may work for what most people are coming here for, but it is incomplete. Anything that is not covered by the conditions (i.e. something un-serializable that is not `np.generic`) will be mapped to `null`. Just try with an empty function and you'll notice that it doesn't raise any errors. The json.dumps [doc page](https://docs.python.org/3/library/json.html#json.dump) is clear about this. – victorlin Dec 30 '22 at 20:25
  • I was debating between creating a `default` function (this solution) and creating a custom `JSONEncoder` subclass. The custom encoder is more complete since it allows proper fall-back to the default encoder which automatically raises meaningful `TypeError`s such as the one in the question title. – victorlin Dec 30 '22 at 20:27
  • Also, there is nothing stating that `object.item()` converts numpy objects into serializable values. It probably isn't what you want when `object` is a `numpy.ndarray` ([docs](https://numpy.org/doc/1.23/reference/generated/numpy.ndarray.item.html): "Copy an **element** of an array to a standard Python scalar and return it"). Instead, `object.tolist()` returns a list of Python scalars. – victorlin Dec 30 '22 at 20:35
  • @victorlin it's is important to note that this is intended to serialize mixed numpy and non numpy objects. Since it uses default, it should not fail if an item is not a numpy item. That is by design. – conmak Dec 31 '22 at 21:07
  • @conmak this answer only adds serialization of numpy objects. Yes, the default `JSONEncoder` is used for standard serializable objects. However, the category of non-serializable, non-numpy objects is a gap that silently translates to `null`. For those, I'd rather have the program raise a `TypeError`. – victorlin Jan 02 '23 at 05:36
20

Actually, there is no need to write an encoder, just changing the default to str when calling the json.dumps function takes care of most types by itself so in one line of code:

json.dumps(data, default=str)

From the docs of json.dumps and json.dump: https://docs.python.org/3/library/json.html#json.dump

If specified, default should be a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a TypeError. If not specified, TypeError is raised.

So calling str converts the numpy types (such as numpy ints or numpy floats) to strings that can be parsed by json. If you have numpy arrays or ranges, they have to be converted to lists first though. In this case, writing an encoder as suggested by Jie Yang might be a more suitable solution.

mapazarr
  • 373
  • 2
  • 7
  • 2
    Excellent solution. However, one drawback is that numpy arrays are returned without delimiter, e.g. '[0 1 3]'. As a json list usually has a comma as delimiter that could be added in a second step, though. – Andrew Sep 07 '22 at 10:07
  • Slight improvement, if all types are known (except int64), then coercing to `int` means your numbers remain numbers in the JSON (instead of becoming strings): `json.dump(vocab, outfile, default=int)` – FrozenKiwi Mar 14 '23 at 21:00
13

If you are going to serialize a numpy array, you can simply use ndarray.tolist() method.

From numpy docs,

a.tolist() is almost the same as list(a), except that tolist changes numpy scalars to Python scalars

In [1]: a = np.uint32([1, 2])

In [2]: type(list(a)[0])
Out[2]: numpy.uint32

In [3]: type(a.tolist()[0])
Out[3]: int
Tharindu Sathischandra
  • 1,654
  • 1
  • 15
  • 37
  • 3
    This was my cause and simple fix. infact writing encoder to to convert np array to list is overkill. thanks – sandeepsign Aug 31 '21 at 19:59
  • Does work for float, int, and bool but not with datetime, complex types in combination with `json.dump(s)` – Andrew Sep 07 '22 at 11:58
6

This might be the late response, but recently i got the same error. After lot of surfing this solution helped me.

def myconverter(obj):
        if isinstance(obj, np.integer):
            return int(obj)
        elif isinstance(obj, np.floating):
            return float(obj)
        elif isinstance(obj, np.ndarray):
            return obj.tolist()
        elif isinstance(obj, datetime.datetime):
            return obj.__str__()

Call myconverter in json.dumps() like below. json.dumps('message', default=myconverter)

shiva
  • 5,083
  • 5
  • 23
  • 42
  • 1
    or you can use `elif isinstance(obj, (datetime.date, datetime.datetime)): return obj.isoformat()` – Colin Anthony Sep 16 '20 at 08:23
  • or even `if isinstance(obj, datetime.date): return obj.isoformat()` `datetime.datetime` is a subclass of `datetime.date` – absoup Dec 15 '21 at 22:02
5

Here's a version that handles bools and NaN values-which are not part of JSON spec-as null.

import json
import numpy as np

class NpJsonEncoder(json.JSONEncoder):
  """Serializes numpy objects as json."""

  def default(self, obj):
    if isinstance(obj, np.integer):
      return int(obj)
    elif isinstance(obj, np.bool_):
      return bool(obj)
    elif isinstance(obj, np.floating):
      if np.isnan(obj):
        return None  # Serialized as JSON null.
      return float(obj)
    elif isinstance(obj, np.ndarray):
      return obj.tolist()
    else:
      return super().default(obj)

# Your code ... 
json.dumps(data, cls=NpEncoder)
Max Bileschi
  • 2,103
  • 2
  • 21
  • 19
1

If you have this error

TypeError: Object of type 'int64' is not JSON serializable

You can change that specific columns with int dtype to float64, as example:

df = df.astype({'col1_int':'float64', 'col2_int':'float64', etc..})

Float64 is written fine in Google Spreadsheets

1

There are excellent answers in this post, suitable for most cases. However, I needed a solution that works for all numpy types (e.g., complex numbers) and returns json conform (i.e., comma as the list separator, non-supported types converted to strings).

Test Data

import numpy as np
import json

data = np.array([0, 1+0j, 3.123, -1, 2, -5, 10], dtype=np.complex128)
data_dict = {'value': data.real[-1], 
             'array': data.real,
             'complex_value': data[-1], 
             'complex_array': data,
             'datetime_value': data.real.astype('datetime64[D]')[0],
             'datetime_array': data.real.astype('datetime64[D]'),
           }

Solution 1: Updated NpEncoder with Decoding to numpy

JSON natively supports only strings, integers, and floats but no special (d)types such as complex or datetime. One solution is to convert those special (d)types to an array of strings with the advantage that numpy can read it back easily, as outlined in the decoder section below.

class NpEncoder(json.JSONEncoder):
    def default(self, obj):
        dtypes = (np.datetime64, np.complexfloating)
        if isinstance(obj, dtypes):
            return str(obj)
        elif isinstance(obj, np.integer):
            return int(obj)
        elif isinstance(obj, np.floating):
            return float(obj)
        elif isinstance(obj, np.ndarray):
            if any([np.issubdtype(obj.dtype, i) for i in dtypes]):
                return obj.astype(str).tolist()
            return obj.tolist()
        return super(NpEncoder, self).default(obj)

# example usage
json_str = json.dumps(data_dict, cls=NpEncoder)
# {"value": 10.0, "array": [0.0, 1.0, 3.123, -1.0, 2.0, -5.0, 10.0], "complex_value": "(10+0j)", "complex_array": ["0j", "(1+0j)", "(3.123+0j)", "(-1+0j)", "(2+0j)", "(-5+0j)", "(10+0j)"], "datetime_value": "1970-01-01", "datetime_array": ["1970-01-01", "1970-01-02", "1970-01-04", "1969-12-31", "1970-01-03", "1969-12-27", "1970-01-11"]}

Decoding to numpy

Special (d)types must be converted manually after loading the JSON.

json_data = json.loads(json_str)

# Converting the types manually
json_data['complex_value'] = complex(json_data['complex_value'])
json_data['datetime_value'] = np.datetime64(json_data['datetime_value'])

json_data['array'] = np.array(json_data['array'])
json_data['complex_array'] = np.array(json_data['complex_array']).astype(np.complex128)
json_data['datetime_array'] = np.array(json_data['datetime_array']).astype(np.datetime64)

Solution 2: Numpy.array2string

Another option is to convert numpy arrays or values to strings numpy internally, i.e.: np.array2string. This option should be pretty robust, and you can adopt the output as needed.

import sys
import numpy as np

def np_encoder(obj):
    if isinstance(obj, (np.generic, np.ndarray)):
        out = np.array2string(obj,
                              separator=',',
                              threshold=sys.maxsize,
                              precision=50,
                              floatmode='maxprec')
        # remove whitespaces and '\n'
        return out.replace(' ','').replace('\n','')

# example usage
json.dumps(data_dict, default=np_encoder)
# {"value": 10.0, "array": "[0.,1.,3.123,-1.,2.,-5.,10.]", "complex_value": "10.+0.j", "complex_array": "[0.+0.j,1.+0.j,3.123+0.j,-1.+0.j,2.+0.j,-5.+0.j,10.+0.j]", "datetime_value": "'1970-01-01'", "datetime_array": "['1970-01-01','1970-01-02','1970-01-04','1969-12-31','1970-01-03','1969-12-27','1970-01-11']"}

Comments:

  • all numpy arrays are strings ("[1,2]" vs. [1,2]) and must be read with a special decoder
  • threshold=sys.maxsize returns as many entries as possible without triggering summarization (...,).
  • With the other parameters (precision, floatmode, formatter, ...) you can adapt your output as needed.
  • For a compact JSON, I removed all whitespaces and linebreaks (.replace(' ','').replace('\n','')).
Andrew
  • 817
  • 4
  • 9
  • complex number is not a standard feature of JSON, so you might have to take some extra care to make sure that the deserializer understands your chosen format – LudvigH Aug 24 '23 at 09:45
  • 1
    That's correct, and the workaround is to convert the non-native JSON formats to strings. At the JSON import, the conversion to the non-native JSON formats can be then done manually. Also added the latest version I'm using (the first solution). – Andrew Aug 28 '23 at 21:13
-1
update_data = {
    'name': str(store['entity_name'].iloc[i]),
    'count__c': str(store['count'].iloc[i]) 
}
Kao-Yuan Lin
  • 65
  • 1
  • 6
-2

If you have control over the creation of DataFrame, you can force it to use standard Python types for values (e.g. int instead of numpy.int64) by setting dtype to object:

df = pd.DataFrame(data=some_your_data, dtype=object)

The obvious downside is that you get less performance than with primitive datatypes. But I like this solution tbh, it's really simple and eliminates all possible type problems. No need to give any hints to the ORM or json.

Expurple
  • 877
  • 6
  • 13
-2

I was able to make it work with loading the dump.

Code:

import json

json.loads(json.dumps(your_df.to_dict()))

Ruth
  • 1