I have the following dataframe:
d_test = {
'name' : ['bob', 'rob', 'dan', 'steeve', 'carl', 'steeve', 'dan', 'carl', 'bob'],
'group': [1, 4, 3, 3, 2, 3, 2, 1, 5]
}
df_test = pd.DataFrame(d_test)
I am looking for a way to add column duplicate
with True
/False
for each entry. I want True
only for case if there is more than one duplicate from 'name' that belongs more than one 'group' number. Here is expected output:
name group duplicate
0 bob 1 True
1 rob 4 False
2 dan 3 True
3 steeve 3 False
4 carl 2 True
5 steeve 3 False
6 dan 2 True
7 carl 1 True
8 bob 5 True
For example above, row 0
has True
in duplicate
because name
is the same as in row 8
and group
number is different (1
and 5
). Row 3
has False
in duplicate
because no duplicates exist outside of the same group 3
.