0

I have some problems with data processing based on a pandas rolling-window using a simple self-built function. I have three columns with values and want to use a simple list comprehension to compute one column out of it for further processing. In my example I simply sum-up the values which produces exactly one value for each window. But it seems that the list comprehension fails... import pandas as pd import numpy as np from collections import Counter as count

df = pd.DataFrame(np.random.randint(0,100,size=(50, 3)), columns=list('ABC'))

def my_test(data): Abs = [int(np.sqrt(x[0]**2+x[1]**2+x[2]**2)/10) for x in data] return sum(Abs)

entr = df.rolling(10).apply(my_test)

This is the error message I get when executing the function:

entr =  df.rolling(10).apply(my_test)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\tpotrusil\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 1207, in apply
    return super(Rolling, self).apply(func, args=args, kwargs=kwargs)
  File "C:\Users\tpotrusil\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 856, in apply
    center=False)
  File "C:\Users\tpotrusil\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 799, in _apply
    result = np.apply_along_axis(calc, self.axis, values)
  File "C:\Users\tpotrusil\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\shape_base.py", line 116, in apply_along_axis
    res = asanyarray(func1d(inarr_view[ind0], *args, **kwargs))
  File "C:\Users\tpotrusil\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 795, in calc
    closed=self.closed)
  File "C:\Users\tpotrusil\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\window.py", line 853, in f
    offset, func, args, kwargs)
  File "pandas\_libs\window.pyx", line 1450, in pandas._libs.window.roll_generic (pandas\_libs\window.c:36061)
  File "<stdin>", line 2, in my_test
  File "<stdin>", line 2, in <listcomp>
IndexError: invalid index to scalar variable.

Any idea how I can access the rolling-data?

cs95
  • 379,657
  • 97
  • 704
  • 746
osteocyt
  • 47
  • 6
  • Nothing to do with pandas rolling window. What is your function supposed to do? At the moment, `data` is a numpy array of float values, which makes `x` a float. And you try to address it with `x[0]` etc as if it were a list or an array. – Mr. T Jan 31 '18 at 09:42

1 Answers1

0

Try this. Convert to a series of lists and then apply this function:

def my_test(r):
    return int(np.sqrt(sum(r**2)/10))

dfs = pd.Series(data=[df.loc[x].values for x in df.index], index=df.index)
dfs.apply(my_test).rolling(10).sum()
drublackberry
  • 238
  • 1
  • 7