5

question

I'm having trouble identifying numpy.int64 objects, in order to convert them to base python int for json serialisation. isinstance usually works but does not in the following example, and I would love to understand why this is.

>>> x
0
>>> type(x)
<class 'numpy.int64'>
>>> import numpy
>>> isinstance(x, numpy.int64)
False

context

x in the above comes from my application, generated by to_dict on a pandas dataframe. Various dataframes are used to generate the result, hence why I can't just use pandas to_json instead.

taking hints from How to identify numpy types in python?, I've actually managed to sucessfully detect these items (which sometimes aren't numpy objects at all), using the following:

>>> (isinstance(x, (pd.np.ndarray, pd.np.generic)) and 
>>> pd.np.issubdtype(x, pd.np.dtype('int64')))
True

but I would appreciate very much if someone could explain why the first option doesn't work, so that I can be confident enough to deploy this into our production system. Using simplejson and a custom JSONDecoder class which uses isinstance(obj, pd.np.int64) has worked for months, but it has suddenly stopped working with the example above.

pickle.dumps(x) gives b'\x80\x03cnumpy.core.multiarray\nscalar\nq\x00cnumpy\ndtype\nq\x01X\x02\x00\x00\x00i8q\x02K\x00K\x01\x87q\x03Rq\x04(K\x03X\x01\x00\x00\x00<q\x05NNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tq\x06bC\x08\x00\x00\x00\x00\x00\x00\x00\x00q\x07\x86q\x08Rq\t.'

interestingly, pickling the object seems to fix the issue.

>>> isinstance(pickle.loads(pickle.dumps(x)), pd.np.int64)
True
Community
  • 1
  • 1
Peter Fine
  • 2,873
  • 3
  • 14
  • 16
  • what's your python setup? (version, etc...) I cannot reproduce on anaconda python 3.6 (windows 64 bit) – Aaron Mar 21 '17 at 16:03
  • have you tried `type(x) == np.int64`? also why do you have a numpy type before importing numpy? – Aaron Mar 21 '17 at 16:05
  • Python 3.4.3 (default, Aug 20 2015, 14:15:05) [GCC 4.6.3] on linux – Peter Fine Mar 21 '17 at 16:09
  • `type(x) == np.int64` is `False` – Peter Fine Mar 21 '17 at 16:09
  • pd.__version__ = '0.18.1', and pd.np.__version__ = '1.11.0' – Peter Fine Mar 21 '17 at 16:10
  • Is this a repeatable example? It will hard for us to help unless we can generate the problem object as well. There's no way of knowing why/how the original `x` is different from the `pickle` round trip. – hpaulj Mar 21 '17 at 16:49
  • I tried pickling it to create a reproducible example, but was surprised that after pickling it worked. So I'm having trouble reproducing. @hpaulj is there some way I can interrogate x that might show up the difference before and after? – Peter Fine Mar 21 '17 at 17:24
  • It's barely possible that you have two numpys installed, and the code that creates `x` here is somehow picking up a different one than you get when you `import numpy` later. – Robert Kern Mar 21 '17 at 17:28
  • 1
    Check `pd.np.__file__` and `numpy.__file__`. – Robert Kern Mar 21 '17 at 17:31
  • @RobertKern `pd.np.__file__` = `'/home/vagrant/virtualenv/lib/python3.4/site-packages/numpy/__init__.py'` and `numpy.__file__` is the same – Peter Fine Mar 21 '17 at 17:48
  • And `pd.np is numpy`? – Robert Kern Mar 21 '17 at 18:38
  • Have you done any module reloading? That can cause fun issues like this. – Robert Kern Mar 21 '17 at 18:47
  • yes @RobertKern, tracing through the program I can see the problematic items were read from the database using pandas, processed, pickled then written to redis (as a caching layer). They are then recovered, processed some more, and then the problem occurs when I try to write the result. I can see how this can cause issues, but I have the end to end process running in debug from pycharm using the same virtual machine (through vagrant), same runtime and same libraries (confirmed in both places using `__file__`. It worked fine for months. Not easy to debug! Anything else I can compare perhaps? – Peter Fine Mar 21 '17 at 19:57
  • 1
    Set aside the `__file__` thing. Focus on the identity comparison: `pd.np is numpy`. If Python ends up importing the same module twice and doesn't realize that it's already imported, then you'll get problems like this (even if it's importing from the same files). If you are running this service via a web framework that automatically reloads modules (e.g. for development), it's possible that it accidentally reloaded the `numpy` module, creating two different `numpy.int64` types. – Robert Kern Mar 21 '17 at 21:59
  • Thanks @RobertKern, I think that must be the cause of the issue, and so I now understand why `pd.np.issubdtype` is a more suitable way to detect these items. Cheers for your help! – Peter Fine Mar 22 '17 at 16:58

0 Answers0