3

Running into an issue with a jupyter notebook project I'm trying to get working on my Windows 10 machine, running Python 3. I get the mentioned error from this function:

buy_per_min = (buy
               .groupby([pd.Grouper(key='timestamp', freq='Min'), 'price'])
               .shares
               .sum()
               .apply(np.log)
               .to_frame('shares')
               .reset_index('price')
               .between_time(market_open, market_close)
               .groupby(level='timestamp', as_index=False, group_keys=False)
               .apply(lambda x: x.nlargest(columns='price', n=depth))
               .reset_index())
buy_per_min.timestamp = buy_per_min.timestamp.add(utc_offset).astype(int)
buy_per_min.info()

The issue is in the buy_per_min.timestamp = buy_per_min.timestamp.add(utc_offset).astype(int) line, but I don't fully understand why I'm getting it. This is the full traceback:

TypeError                                 Traceback (most recent call last)
<ipython-input-28-396768b710c8> in <module>()
     10                .apply(lambda x: x.nlargest(columns='price', n=depth))
     11                .reset_index())
---> 12 buy_per_min.timestamp = buy_per_min.timestamp.add(utc_offset).astype(int)
     13 buy_per_min.info()

~\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
   5689             # else, only a single dtype is given
   5690             new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 5691                                          **kwargs)
   5692             return self._constructor(new_data).__finalize__(self)
   5693 

~\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, **kwargs)
    529 
    530     def astype(self, dtype, **kwargs):
--> 531         return self.apply('astype', dtype=dtype, **kwargs)
    532 
    533     def convert(self, **kwargs):

~\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
    393                                             copy=align_copy)
    394 
--> 395             applied = getattr(b, f)(**kwargs)
    396             result_blocks = _extend_blocks(applied, result_blocks)
    397 

~\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors, values, **kwargs)
    532     def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
    533         return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 534                             **kwargs)
    535 
    536     def _astype(self, dtype, copy=False, errors='raise', values=None,

~\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\internals\blocks.py in _astype(self, dtype, **kwargs)
   2137 
   2138         # delegate
-> 2139         return super(DatetimeBlock, self)._astype(dtype=dtype, **kwargs)
   2140 
   2141     def _can_hold_element(self, element):

~\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\internals\blocks.py in _astype(self, dtype, copy, errors, values, **kwargs)
    631 
    632                     # _astype_nansafe works fine with 1-d only
--> 633                     values = astype_nansafe(values.ravel(), dtype, copy=True)
    634 
    635                 # TODO(extension)

~\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna)
    644         raise TypeError("cannot astype a datetimelike from [{from_dtype}] "
    645                         "to [{to_dtype}]".format(from_dtype=arr.dtype,
--> 646                                                  to_dtype=dtype))
    647 
    648     elif is_timedelta64_dtype(arr):

TypeError: cannot astype a datetimelike from [datetime64[ns]] to [int32]

Is there some kind of conversion I need to do to the timestamp info, and what might it look like? Thanks!

UPDATE

There has been a similar question asked before, which I already read, but fail to see how that can be applied to my issue and would love an explanation if someone else knew. It can be found here:

Pandas DataFrame - 'cannot astype a datetimelike from [datetime64[ns]] to [float64]' when using ols/linear regression

wildcat89
  • 1,159
  • 16
  • 47
  • @jezrael, I did see that answer, but it doesn't answer my particular question, and again, I'm not entirely sure what the error means so was also hoping for an explanation. The other answer didn't explain the 'why' behind it either. Would love a working a solution for my problem and an explanation of why so that I can learn from it. Thanks! – wildcat89 May 25 '19 at 19:17
  • Sure, is possible add some data for test your solution? – jezrael May 25 '19 at 19:18
  • @jezrael Check out this project: https://github.com/PacktPublishing/Hands-On-Machine-Learning-for-Algorithmic-Trading/blob/master/Chapter02/01_NASDAQ_TotalView-ITCH_Order_Book/01_build_itch_order_book.ipynb ...scroll down to In `[102]:` under **Order Book Depth** to see everything. This is the project I'm trying to run on my machine, but I'm running into that error when I try it. Thanks! – wildcat89 May 25 '19 at 19:22

3 Answers3

2

Pandas cannot convert datetimes to int32, so raised error. If convert to np.int64 it working, also working convert numpy array converted to int with wrong values or convert to int64 - then get datetimes in native format in nanoseconds:

rng = pd.date_range('2017-04-03 12:00:45', periods=3)
buy_per_min = pd.DataFrame({'timestamp': rng})  

from datetime import timedelta
utc_offset = timedelta(hours=4)

print (buy_per_min.timestamp.add(utc_offset))
0   2017-04-03 16:00:45
1   2017-04-04 16:00:45
2   2017-04-05 16:00:45
Name: timestamp, dtype: datetime64[ns]

print (buy_per_min.timestamp.add(utc_offset).values)
['2017-04-03T16:00:45.000000000' '2017-04-04T16:00:45.000000000'
 '2017-04-05T16:00:45.000000000']
print (buy_per_min.timestamp.add(utc_offset).values.astype(np.int64))
[1491235245000000000 1491321645000000000 1491408045000000000]

print (buy_per_min.timestamp.add(utc_offset).astype(np.int64))
0    1491235245000000000
1    1491321645000000000
2    1491408045000000000
Name: timestamp, dtype: int64

#https://stackoverflow.com/a/12716674
print (buy_per_min.timestamp.add(utc_offset).values.astype(int))
[ -289111552 -2146205184   291668480]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

Changing .astype(int) to .astype('int64') solves the issue too.

from the Pylance documentation of .astype() method:

(method) astype: (dtype: Any | _str | Type[str] | Type[bytes] | Type[date] | Type[datetime] | Type[timedelta] | Type[bool] | Type[int] | Type[float] | Type[complex] | Type[Timestamp] | Type[Timedelta], copy: _bool = ..., errors: _str = ...) -> Series
Cast a pandas object to a specified dtype dtype.

Parameters
dtype : data type, or dict of column name -> data type
    Use a numpy.dtype or Python type to cast entire pandas object to
    the same type. Alternatively, use {col: dtype, ...}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame's columns to column-specific types.
copy : bool, default True
    Return a copy when copy=True (be very careful setting copy=False as changes to values then may propagate to other pandas objects).
errors : {'raise', 'ignore'}, default 'raise'
    Control raising of exceptions on invalid data for provided dtype.

raise : allow exceptions to be raised
ignore : suppress exceptions. On error return original object.
Returns
casted : same type as caller

See Also
to_datetime : Convert argument to datetime.
to_timedelta : Convert argument to timedelta.
to_numeric : Convert argument to a numeric type.
numpy.ndarray.astype : Cast a numpy array to a specified type.

Notes
Examples
Create a DataFrame:

>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df.dtypes
col1    int64
col2    int64
dtype: object
Cast all columns to int32:

>>> df.astype('int32').dtypes
col1    int32
col2    int32
dtype: object
Cast col1 to int32 using a dictionary:

>>> df.astype({'col1': 'int32'}).dtypes
col1    int32
col2    int64
dtype: object
Create a series:

>>> ser = pd.Series([1, 2], dtype='int32')
>>> ser
0    1
1    2
dtype: int32
>>> ser.astype('int64')
0    1
1    2
dtype: int64
Convert to categorical type:

>>> ser.astype('category')
0    1
1    2
dtype: category
Categories (2, int64): [1, 2]
Convert to ordered categorical type with custom ordering:

>>> from pandas.api.types import CategoricalDtype
>>> cat_dtype = CategoricalDtype(
...     categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype)
0    1
1    2
dtype: category
Categories (2, int64): [2 < 1]
Note that using copy=False and changing data on a new pandas object may propagate changes:

>>> s1 = pd.Series([1, 2])
>>> s2 = s1.astype('int64', copy=False)
>>> s2[0] = 10
>>> s1  # note that s1[0] has changed too
0    10
1     2
dtype: int64
Create a series of dates:

>>> ser_date = pd.Series(pd.date_range('20200101', periods=3))
>>> ser_date
0   2020-01-01
1   2020-01-02
2   2020-01-03
dtype: datetime64[ns]
Bakr
  • 117
  • 8
1

I have just hit a very similar issue:

TypeError: cannot astype a datetimelike from [datetime64[ns]] to [bool]

And in my case the problem was solved by adding braces. Compare this:

df2 = df[
    (df['column1'] != df['column2']) &
    df['column3'] >= '03.02.2020'
].copy()

to this:

df2 = df[
    (df['column1'] != df['column2']) &
    (df['column3'] >= '03.02.2020')
].copy()

It looks like in my case the error message was simply triggered by the fact that the & operator was being applied to the datetime-based column column3.

Stanislav Pankevich
  • 11,044
  • 8
  • 69
  • 129