Well. In this thread, answers are needed only if you have a faster solution to this case =) I have a dataframe that has the following columns - status (contains the values "ok" and "not_responding"), a concat column (contains a concatenated value that can be perceived as an id) and columns that contain values with certain results for a specific period, these columns are indicated in the following format "week number + R/S/F/U prefixes". This dataframe initially contains a certain number of duplicates, due to the fact that one id can have a value in the status column "ok" and "not_responding". And at one of the stages of working with a frame, I need to match it with another frame by the concat column (adding columns W* R/S/F/U, Columns W* A is already was here). And here the fun begins - I do not need duplication of results in columns with a week number and a row prefix. If the values of duplicates in the concat column are equal to each other both in the "ok" status and in the "not_responding" status, i need to delete the result in the "not_responding" status. After a lot of research, I could not find an alternative to the index (+ match) from Google spreadsheets in pandas and I made a clumsy function that cycled through the entire frame for several minutes, checking the results and the status of the rows by id. That how it looks:
for i in range(4):
wnum1 = f'{week[i]} R'
wnum2 = f'{week[i]} S'
wnum3 = f'{week[i]} F'
ind1 = df.loc[df.status == 'ok', 'concat'].reset_index(drop=True)
ind2 = df.loc[df.status == 'not_responding', 'concat'].reset_index(drop=True)
D1 = df.loc[df.status == 'ok', wnum1].reset_index(drop=True)
D2 = df.loc[df.status == 'not_responding', wnum1].reset_index(drop=True)
for j in ind2:
if len(df.loc[df.concat == j]) > 1:
if D2.iloc[ind2[ind2==j].index[0]] == D1.iloc[ind1[ind1==j].index[0]]:
df.loc[(df.status == 'not_responding') & (df.concat == j), wnum1] = pd.NA
df.loc[(df.status == 'not_responding') & (df.concat == j), wnum2] = pd.NA
df.loc[(df.status == 'not_responding') & (df.concat == j), wnum3] = pd.NA
However, in the end I found an option that works for fractions of seconds using masks. In general, I hope this helps some newbie like me find a faster solution =)
for i in range(4):
wnum1 = f'{week[i]} R'
wnum2 = f'{week[i]} S'
wnum3 = f'{week[i]} U'
mask1 = df.loc[:, ['concat', wnum1]].duplicated(keep='first')
mask2 = df.loc[:, ['concat', wnum2]].duplicated(keep='first')
mask3 = df.loc[:, ['concat', wnum3]].duplicated(keep='first')
df.loc[mask1, wnum1] = pd.NA
df.loc[mask2, wnum2] = pd.NA
df.loc[mask3, wnum3] = pd.NA