1

I have the following pandas

time                     vwap   sumQty  rvol    p_y        p_x
2019-01-07 09:00:00     10000.0 500.0   NaN   NaN   [1.0, 1.0, nan, nan, nan]
2019-01-07 09:05:00     2000.0  400.0   NaN -8000.0 [1.0, 2.0, 10000.0, nan, nan]
2019-01-07 09:10:00     1500.0  800.0   NaN -500.0  [1.0, 3.0, 2000.0, 10000.0, nan]
2019-01-07 09:15:00     2400.0  710.0   NaN 900.0   [1.0, 4.0, 1500.0, 2000.0, 10000.0]

For columns df['p_x'] and df['p_y'], I would like to stack/ combine every row with a rolling 3 windows before it and create two new columns df['r_x'] and df['r_y] for them respectively

Meaning at time = 09:15:00, df['r_y'] = [-8000, -500, 900] and df['r_x'] = [[1.0, 2.0, 10000.0, nan, nan], [1.0, 3.0, 2000.0, 10000.0, nan], [1.0, 4.0, 1500.0, 2000.0, 10000.0]] etc.

I have tried: df['r_x'] = df['p_x'].rolling(3).apply(lambda x: list(x)). However, I got this error:

~/dev/jup/venv/lib/python3.7/site-packages/pandas/core/window.py in apply(self, func, raw, args, kwargs)
   1701     def apply(self, func, raw=None, args=(), kwargs={}):
   1702         return super(Rolling, self).apply(
-> 1703             func, raw=raw, args=args, kwargs=kwargs)
   1704 
   1705     @Substitution(name='rolling')

~/dev/jup/venv/lib/python3.7/site-packages/pandas/core/window.py in apply(self, func, raw, args, kwargs)
   1010 
   1011         return self._apply(f, func, args=args, kwargs=kwargs,
-> 1012                            center=False, raw=raw)
   1013 
   1014     def sum(self, *args, **kwargs):

~/dev/jup/venv/lib/python3.7/site-packages/pandas/core/window.py in _apply(self, func, name, window, center, check_minp, **kwargs)
    839         results = []
    840         for b in blocks:
--> 841             values = self._prep_values(b.values)
    842 
    843             if values.size == 0:

~/dev/jup/venv/lib/python3.7/site-packages/pandas/core/window.py in _prep_values(self, values, kill_inf)
    212             except (ValueError, TypeError):
    213                 raise TypeError("cannot handle this type -> {0}"
--> 214                                 "".format(values.dtype))
    215 
    216         if kill_inf:

TypeError: cannot handle this type -> object

Could someone help me out? Thank you!

atjw94
  • 529
  • 1
  • 6
  • 22

2 Answers2

1

Here is necessary numpy:

a = np.array(df['p_x'].values.tolist())
print (a)
[[1.0e+00 1.0e+00     nan     nan     nan]
 [1.0e+00 2.0e+00 1.0e+04     nan     nan]
 [1.0e+00 3.0e+00 2.0e+03 1.0e+04     nan]
 [1.0e+00 4.0e+00 1.5e+03 2.0e+03 1.0e+04]]

#https://stackoverflow.com/a/45748530
L = 3 
s0,s1 = a.strides
shp = a.shape
out = a[range(L) + np.arange(shp[0]-L+1)[:,None]]
print (out)
[[[1.0e+00 1.0e+00     nan     nan     nan]
  [1.0e+00 2.0e+00 1.0e+04     nan     nan]
  [1.0e+00 3.0e+00 2.0e+03 1.0e+04     nan]]

 [[1.0e+00 2.0e+00 1.0e+04     nan     nan]
  [1.0e+00 3.0e+00 2.0e+03 1.0e+04     nan]
  [1.0e+00 4.0e+00 1.5e+03 2.0e+03 1.0e+04]]]

Or list comprehension:

#https://stackoverflow.com/a/45748613
L = 3 
out = np.array([a[i:i + 3] for i in range(0, len(a) - 2)])
print (out)
[[[1.0e+00 1.0e+00     nan     nan     nan]
  [1.0e+00 2.0e+00 1.0e+04     nan     nan]
  [1.0e+00 3.0e+00 2.0e+03 1.0e+04     nan]]

 [[1.0e+00 2.0e+00 1.0e+04     nan     nan]
  [1.0e+00 3.0e+00 2.0e+03 1.0e+04     nan]
  [1.0e+00 4.0e+00 1.5e+03 2.0e+03 1.0e+04]]]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you @jezrael, would like to check if you if this numpy array can be set as a dataframe column ```df['r_x']``` and what does each step means as I am trying to understand. – atjw94 Aug 05 '19 at 06:44
  • @atjw94 - hmmm, it is necessary? – jezrael Aug 05 '19 at 06:44
  • @atjw94 - I think yes, first 2 values are NaNs? – jezrael Aug 05 '19 at 06:46
  • yes, thank you so much and really sorry for the trouble! – atjw94 Aug 05 '19 at 06:50
  • @atjw94 - I try and it is not easy, because pandas bad handling non scalar values, my first idea failed `df.loc[df.index[2:], 'new'] = out.tolist() ` – jezrael Aug 05 '19 at 06:51
  • Hey @jezrael, thank you for your help, my friend and I have figured out how to do it after intensive searching on google, this is how we have done it: – atjw94 Aug 05 '19 at 07:06
  • ```!pip install rolling import rolling window_size = 3 df = pd.DataFrame({'a':[[1,2,3],[4,5],[6],[7,8],[9,10,11]]}) b = (window_size-1)*[np.nan] + list(rolling.Apply(df.a, window_size, operation=lambda x: list(np.block(x)))) df['b'] = b df``` – atjw94 Aug 05 '19 at 07:06
0
!pip install rolling
import rolling
window_size = 3
df = pd.DataFrame({'a':[[1,2,3],[4,5],[6],[7,8],[9,10,11]]})
b = (window_size-1)*[np.nan] + list(rolling.Apply(df.a, window_size, operation=lambda x: list(np.block(x))))
df['b'] = b
df
atjw94
  • 529
  • 1
  • 6
  • 22