I'm trying to find the repeated ids based on the first day's value.
For example, I have records for 4 days:
import pandas as pd
df = pd.DataFrame({'id':['1','2','5','4','2','3','5','4','2','5','2','3','3','4'],
'class':['1','1','0','0','1','1','1','1','0','0','0','0','1','1'],
'day':['1','1','1','1','1','1','1','2','2','3','3','3','4','4']})
df
Given the above data, I'd like to find the records that fit the following conditions: (1) all the records in day=1 that has class = 0; (2) On day 2, 3, 4, keep the records if the id satisfies condition (1)--have class=0 on day 1
So the results should be:
df = pd.DataFrame({'id':['5','4','4','5','4'],
'class':['0','0','1','0','1'],
'day':['1','1','2','3','4']})
df
This method would work:
# 1. find unique id in day 1 that meet condition (1)
df1 = df[(df['day']=='1') & (df['class']=='0')]
df1_id = df1.id.unique()
# 2. create a new dataframe for day 2,3,4
df234=df[df['day']!='1']
# 3. create a new dataframe for day2,3,4 that contains the id in the unique list
df234_new = df234[df234['id'].isin(df1_id)]
#4. append df234_new at the end of df1
df_new = df1.append(df234_new)
df_new
But my full dataset contains way more columns and rows, using the above method sound too tedious. Does anyone know how to do it more efficiently? Thank you very much!!