I'm manually passing specific values in a pandas df
to a function. This is fine but I'm hoping to make the process more efficient. Specifically, I first subset all consecutive values in Item
. I then take the respective values in Val
and pass them to func
. This produces the value I need.This is ok for smaller df's but become inefficient for larger datasets.
I'm just hoping to make this process more efficient to applying the values to the original df.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Time' : ['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15'],
'Val' : [35,38,31,30,35,31,32,34,36,38,39,30,25,26,27],
'Item' : ['X','X','X','X','X','Y','Y','Y','Y','Y','Y','X','X','X','X'],
})
df1 = df.groupby([df['Item'].ne(df['Item'].shift()).cumsum(), 'Item']).size()
X1 = df[0:5]
Y1 = df[5:11]
X2 = df[11:15]
V1 = X1['Val1'].reset_index(drop = True)
V2 = Y1['Val1'].reset_index(drop = True)
V3 = X2['Val1'].reset_index(drop = True)
def func(U, m = 2, r = 0.2):
def _maxdist(x_i, x_j):
return max([abs(ua - va) for ua, va in zip(x_i, x_j)])
def _phi(m):
x = [[U[j] for j in range(i, i + m - 1 + 1)] for i in range(N - m + 1)]
C = [len([1 for x_j in x if _maxdist(x_i, x_j) <= r]) / (N - m + 1.0) for x_i in x]
return (N - m + 1.0)**(-1) * sum(np.log(C))
N = len(U)
return abs(_phi(m + 1) - _phi(m))
print(func(V1))
print(func(V2))
print(func(V3))
out:
0.287682072452
0.223143551314
0.405465108108
If I just try to apply the function using groupby
it returns KeyError: 0
. The function doesn't work unless I reset the index.
df1 = df.groupby(['Item']).apply(func)
KeyError: 0
Intended Output:
Time Val1 Item func
0 1 35 X 0.287
1 2 38 X 0.287
2 3 31 X 0.287
3 4 30 X 0.287
4 5 35 X 0.287
5 6 31 Y 0.223
6 7 32 Y 0.223
7 8 34 Y 0.223
8 9 36 Y 0.223
9 10 38 Y 0.223
10 11 39 Y 0.223
11 12 30 X 0.405
12 13 25 X 0.405
13 14 26 X 0.405
14 15 27 X 0.405