There are excellent answers in this post, suitable for most cases. However, I needed a solution that works for all numpy types (e.g., complex numbers) and returns json conform (i.e., comma as the list separator, non-supported types converted to strings).
Test Data
import numpy as np
import json
data = np.array([0, 1+0j, 3.123, -1, 2, -5, 10], dtype=np.complex128)
data_dict = {'value': data.real[-1],
'array': data.real,
'complex_value': data[-1],
'complex_array': data,
'datetime_value': data.real.astype('datetime64[D]')[0],
'datetime_array': data.real.astype('datetime64[D]'),
}
Solution 1: Updated NpEncoder with Decoding to numpy
JSON natively supports only strings, integers, and floats but no special (d)types such as complex or datetime. One solution is to convert those special (d)types to an array of strings with the advantage that numpy can read it back easily, as outlined in the decoder section below.
class NpEncoder(json.JSONEncoder):
def default(self, obj):
dtypes = (np.datetime64, np.complexfloating)
if isinstance(obj, dtypes):
return str(obj)
elif isinstance(obj, np.integer):
return int(obj)
elif isinstance(obj, np.floating):
return float(obj)
elif isinstance(obj, np.ndarray):
if any([np.issubdtype(obj.dtype, i) for i in dtypes]):
return obj.astype(str).tolist()
return obj.tolist()
return super(NpEncoder, self).default(obj)
# example usage
json_str = json.dumps(data_dict, cls=NpEncoder)
# {"value": 10.0, "array": [0.0, 1.0, 3.123, -1.0, 2.0, -5.0, 10.0], "complex_value": "(10+0j)", "complex_array": ["0j", "(1+0j)", "(3.123+0j)", "(-1+0j)", "(2+0j)", "(-5+0j)", "(10+0j)"], "datetime_value": "1970-01-01", "datetime_array": ["1970-01-01", "1970-01-02", "1970-01-04", "1969-12-31", "1970-01-03", "1969-12-27", "1970-01-11"]}
Decoding to numpy
Special (d)types must be converted manually after loading the JSON.
json_data = json.loads(json_str)
# Converting the types manually
json_data['complex_value'] = complex(json_data['complex_value'])
json_data['datetime_value'] = np.datetime64(json_data['datetime_value'])
json_data['array'] = np.array(json_data['array'])
json_data['complex_array'] = np.array(json_data['complex_array']).astype(np.complex128)
json_data['datetime_array'] = np.array(json_data['datetime_array']).astype(np.datetime64)
Solution 2: Numpy.array2string
Another option is to convert numpy arrays or values to strings numpy internally, i.e.: np.array2string
. This option should be pretty robust, and you can adopt the output as needed.
import sys
import numpy as np
def np_encoder(obj):
if isinstance(obj, (np.generic, np.ndarray)):
out = np.array2string(obj,
separator=',',
threshold=sys.maxsize,
precision=50,
floatmode='maxprec')
# remove whitespaces and '\n'
return out.replace(' ','').replace('\n','')
# example usage
json.dumps(data_dict, default=np_encoder)
# {"value": 10.0, "array": "[0.,1.,3.123,-1.,2.,-5.,10.]", "complex_value": "10.+0.j", "complex_array": "[0.+0.j,1.+0.j,3.123+0.j,-1.+0.j,2.+0.j,-5.+0.j,10.+0.j]", "datetime_value": "'1970-01-01'", "datetime_array": "['1970-01-01','1970-01-02','1970-01-04','1969-12-31','1970-01-03','1969-12-27','1970-01-11']"}
Comments:
- all numpy arrays are strings ("[1,2]" vs. [1,2]) and must be read with a special decoder
threshold=sys.maxsize
returns as many entries as possible without
triggering summarization (...,
).
- With the other parameters
(
precision
, floatmode
, formatter
, ...) you can adapt your output as needed.
- For a compact JSON, I removed all whitespaces and linebreaks (
.replace(' ','').replace('\n','')
).