I have 2 dataframes
with different lengths. I would like to compare and delete unavailable values(rows
) from df1.
Here is an example:
df1 = pd.DataFrame({'Filename':['image1','image1','image2','image3'],
'Name':['Dog','Cat','Cat', 'Cat'],
'values':['2','3','4','5'] })
df2 = pd.DataFrame({'Filename':['image1','image2','image3'],
'Name':['Dog','Cat', 'Cat'],
'values':['5','6','7'] })
df1
Filename Name values
0 image1 Dog 2
1 image1 Cat 3
2 image2 Cat 4
3 image3 Cat 5
df2
Filename Name values
0 image1 Dog 5
1 image2 Cat 6
2 image3 Cat 7
I'm expecting 2 dataframes(df1 and df2) with same length and with same Filename
and Name
as below. My goal is to compare the values
column of df1
and df2
with same Filename
and Name
.
df1
Filename Name values
0 image1 Dog 2
2 image2 Cat 4
3 image3 Cat 5
df2
Filename Name values
0 image1 Dog 5
1 image2 Cat 6
2 image3 Cat 7
I have tried comparing each row with corresponding df and delete if not available. (This is clearly not the way to do)
for i, j in df1.iterrows():
for m, n in df1.iterrows():
if m['Filename'] == i['Filename']:
if m['LabelName'] == i['LabelName']:
pass
else:
print('delete')
df2=df2.drop(i)
df1=df1.sort_values('Filename')
df2=df2.sort_values('Filename')
break
I also tried to implement groupby and compare with rows and i encountered ValueError: Can only compare identically-labeled Series objects
since indexes won't be same.
Can someone please help me with this? I tried searching for similar problems but did not come across any.