Find records for repeated id in the following days based the first day's value using python

Question

I'm trying to find the repeated ids based on the first day's value.

For example, I have records for 4 days:

import pandas as pd
df = pd.DataFrame({'id':['1','2','5','4','2','3','5','4','2','5','2','3','3','4'], 
                   'class':['1','1','0','0','1','1','1','1','0','0','0','0','1','1'],
                   'day':['1','1','1','1','1','1','1','2','2','3','3','3','4','4']})
df

Given the above data, I'd like to find the records that fit the following conditions: (1) all the records in day=1 that has class = 0; (2) On day 2, 3, 4, keep the records if the id satisfies condition (1)--have class=0 on day 1

So the results should be:

df = pd.DataFrame({'id':['5','4','4','5','4'], 
                   'class':['0','0','1','0','1'],
                   'day':['1','1','2','3','4']})
df

This method would work:

# 1. find unique id in day 1 that meet condition (1)
df1 = df[(df['day']=='1') & (df['class']=='0')] 

df1_id = df1.id.unique()

# 2. create a new dataframe for day 2,3,4 
df234=df[df['day']!='1'] 

# 3. create a new dataframe for day2,3,4 that contains the id in the unique list 
df234_new = df234[df234['id'].isin(df1_id)]

#4. append df234_new at the end of df1
df_new = df1.append(df234_new) 

df_new

But my full dataset contains way more columns and rows, using the above method sound too tedious. Does anyone know how to do it more efficiently? Thank you very much!!

You could combine steps 2-4 e.g. `df_new = df[(df['day']=='1') & (df['class']=='0') | (df['day'] != '1') & (df['id'].isin(df1_id))] ` — Nick, Mar 09 '21 at 04:18
Thank you Nick! It worked perfectly on my test dataset. Using the real dataset caused some trouble, the df_new has 0 row. Or even using my previous method, all the generated dataframes have 0 rows — goosepea, Mar 09 '21 at 18:34
You'll need to provide a sample of the data which isn't working to debug that... — Nick, Mar 10 '21 at 00:06

Find records for repeated id in the following days based the first day's value using python

0 Answers0