Pandas: how to add column with Booleans (True/False) based on duplicates in one column and group index in another column

Question

I have the following dataframe:

d_test = {
    'name' : ['bob', 'rob', 'dan', 'steeve', 'carl', 'steeve', 'dan', 'carl', 'bob'],
    'group': [1, 4, 3, 3, 2, 3, 2, 1, 5]
}
df_test = pd.DataFrame(d_test)

I am looking for a way to add column duplicate with True/False for each entry. I want True only for case if there is more than one duplicate from 'name' that belongs more than one 'group' number. Here is expected output:

    name    group   duplicate
0   bob     1       True
1   rob     4       False
2   dan     3       True
3   steeve  3       False
4   carl    2       True
5   steeve  3       False
6   dan     2       True
7   carl    1       True
8   bob     5       True

For example above, row 0 has True in duplicate because name is the same as in row 8 and group number is different (1 and 5). Row 3 has False in duplicate because no duplicates exist outside of the same group 3.

What does mean `Row 3 has False in duplicate because no duplicates exist outside of the same group 3.` ? — jezrael, Dec 14 '22 at 05:56
@jezrael we have two Steeves. Row 3 and 5. If we would not have `group` column then we would get `True` for it because it is duplicate. But for this case it is not because we consider entity a duplicate only if it is outside of one group — illuminato, Dec 14 '22 at 05:59
[Likely what you want](https://stackoverflow.com/questions/64128529/find-duplicate-rows-among-different-groups-with-pandas) — mozway, Dec 14 '22 at 06:02

Panda Kim · Accepted Answer · 2022-12-14T06:07:05.610

1

There seems to be something wrong in your example, Perhaps what you want is possible with following code

result = df_test.groupby('name')['group'].transform(lambda x: x.nunique() > 1)
df_test.assign(duplicated=result)

output(df_test.assign(duplicated=result): :

    name    group   duplicated
0   bob     1       True
1   rob     4       False
2   dan     3       True
3   steeve  3       False
4   carl    2       True
5   steeve  3       False
6   dan     2       True
7   carl    1       True
8   bob     5       True

edited Dec 14 '22 at 06:07

answered Dec 14 '22 at 05:58

Panda Kim

6,246
2
12

@jezrael anyway code and output text are different. I think there are something wrong. and i think he want this – Panda Kim Dec 14 '22 at 06:02

Pandas: how to add column with Booleans (True/False) based on duplicates in one column and group index in another column

1 Answers1