2

New to Python here and trying to see if there is a more elegant solution.

I have a time series data of telematics devices that has motion indicator. I need to expand the motion indicator to +/- 1 row of the actual motion start and stop (denoted by motion2 column below). I was doing it in SQL using case statements and lead and lag window functions. Trying to convert my codes to python...

Here is the data. import pandas as pd

data = {'device':[1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2], 
    'time':[1,2,3,4,5,6,7,8,9,10,11,12,5,6,7,8,9,10,11,12,13,14],
    'motion':[0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,1,1,1,0,1,0,0]}

df = pd.DataFrame.from_dict(data)
df = df[['device','time','motion']]

##sort data chronologically for each device
df.sort_values(['device','time'], ascending = True, inplace = True)

This is how df looks like

device, time, motion
1,1,0
1,2,0
1,3,1
1,4,1
1,5,1
1,6,0
1,7,0
1,8,0
1,9,1
1,10,1
1,11,0
1,12,0
2,5,0
2,6,0
2,7,0
2,8,1
2,9,1
2,10,1
2,11,0
2,12,1
2,13,0
2,14,0

What I need is the motion2 column below added to the data frame.

device, time, motion, motion2
1,1,0,0
1,2,0,1
1,3,1,1
1,4,1,1
1,5,1,1
1,6,0,1
1,7,0,0
1,8,0,1
1,9,1,1
1,10,1,1
1,11,0,1
1,12,0,0
2,5,0,0
2,6,0,0
2,7,0,1
2,8,1,1
2,9,1,1
2,10,1,1
2,11,0,1
2,12,1,1
2,13,0,1
2,14,0,0

Below is the python code that does works. However, wondering if there is a more elegant way.

##create new columns for prior and next motion indicator
df['prev_motion'] = df.groupby(['device'])['motion'].shift(1)
df['next_motion'] = df.groupby(['device'])['motion'].shift(-1)

##create the desired motion2 indicator to expand +/- 1 record of the motion 
start and stop

df['motion2'] = df[['prev_motion', 'motion', 'next_motion']].apply(lambda 
row: 1 if row['motion']==1 else (1 if row['prev_motion']==1 or 
row['next_motion']==1 else 0), axis=1)

##drop unwanted columns        
df.drop(columns=['prev_motion', 'next_motion'], inplace = True)

This was much easier in SQL using case statement and windows functions (lead and lag).

case 
when motion = 1 then 1
when motion = 0 and (lead(motion) over (partition by device order by time) = 1) then 1
when motion = 0 and (lag(motion) over (partition by device order by time) = 1) then 1
else 0
end as motion2
nikeshpraj
  • 53
  • 1
  • 6

2 Answers2

3

This isn't necessarily the most elegant, but it works: find any point where motion is 1, or where motion shifted by 1 in either direction is 1. Here are 2 ways to do that with numpy functions (note that numpy functions don't require explicitely importing numpy, as they are also built into pandas and can be accessed via pd.np, but see @Abhi's comment for a pure pandas equivalent):

df['motion2'] = pd.np.where(df.motion.values|pd.np.roll(df.motion.values,1)|pd.np.roll(df.motion.values,-1),1,0)

# The following is Essentially the equivalent, but maybe a bit clearer / more efficient
df['motion2'] = pd.np.stack((df.motion.values,pd.np.roll(df.motion.values,1),pd.np.roll(df.motion.values,-1))).any(0).astype(int)

>>> df
    device  time  motion  motion2
0        1     1       0        0
1        1     2       0        1
2        1     3       1        1
3        1     4       1        1
4        1     5       1        1
5        1     6       0        1
6        1     7       0        0
7        1     8       0        1
8        1     9       1        1
9        1    10       1        1
10       1    11       0        1
11       1    12       0        0
12       2     5       0        0
13       2     6       0        0
14       2     7       0        1
15       2     8       1        1
16       2     9       1        1
17       2    10       1        1
18       2    11       0        1
19       2    12       1        1
20       2    13       0        1
21       2    14       0        0
sacuL
  • 49,704
  • 8
  • 81
  • 106
  • 3
    `pd.concat([df.motion.shift(-1),df.motion.shift(1),df.motion],axis=1).max(axis=1)` – Abhi Oct 20 '18 at 04:38
  • 1
    @Abhi, that would work, it's essentially the same underlying mechanism as what I proposed, just using `pandas` functions rather than `numpy`. As using `numpy` functions doesn't require explicitly importing it as a separate package (they can be accessed with `pd.np.whatever_function()...`), I prefer those, but it's personal preference :) – sacuL Oct 20 '18 at 04:40
  • 1
    It's good. I haven't played around with numpy a lot so I usually use pandas most of the times. :) +1 – Abhi Oct 20 '18 at 04:46
  • 1
    Yeah, it's nice you pointed it out, I referenced your comment in an edit, in case someone is strongly opposed to the `numpy` methods. Thanks! – sacuL Oct 20 '18 at 04:48
  • I realized this wasn't perfectly accurate solution. I also need to partition the window by device. Looks like if the last motion row for device 1 = 1 and the first row for device 2 =0, the device 2 first row still gets motion2=1 even when device 2 second row =0. How would you partition by device and still perform same operations? Thanks again for the help! – nikeshpraj Oct 20 '18 at 15:48
  • I hadn't seen that requirement in your original question, sorry! You could do: `df['motion2'] = df.groupby('device')['motion'].transform(lambda x: pd.np.stack((x.values, pd.np.roll(x.values,1), pd.np.roll(x.values,-1))).any(0).astype(int))` – sacuL Oct 20 '18 at 16:02
0

One option to replicate SQL's case_when is case_when from pyjanitor:

# pip install pyjanitor
import janitor
import pandas as pd

In [5]: df.case_when(
   ...:     df.motion.eq(1), 1,
   ...:     df.motion.eq(0) & df.groupby('device').motion.shift(-1), 1,
   ...:     df.motion.eq(0) & df.groupby('device').motion.shift(), 1,
   ...:     0,
   ...:     column_name = 'motion2')
Out[5]:
    device  time  motion  motion2
0        1     1       0        0
1        1     2       0        1
2        1     3       1        1
3        1     4       1        1
4        1     5       1        1
5        1     6       0        1
6        1     7       0        0
7        1     8       0        1
8        1     9       1        1
9        1    10       1        1
10       1    11       0        1
11       1    12       0        0
12       2     5       0        0
13       2     6       0        0
14       2     7       0        1
15       2     8       1        1
16       2     9       1        1
17       2    10       1        1
18       2    11       0        1
19       2    12       1        1
20       2    13       0        1
21       2    14       0        0
sammywemmy
  • 27,093
  • 4
  • 17
  • 31