0

I have the following dataframe:

d_test = {
    'name' : ['bob', 'rob', 'dan', 'steeve', 'carl', 'steeve', 'dan', 'carl', 'bob'],
    'group': [1, 4, 3, 3, 2, 3, 2, 1, 5]
}
df_test = pd.DataFrame(d_test)

I am looking for a way to add column duplicate with True/False for each entry. I want True only for case if there is more than one duplicate from 'name' that belongs more than one 'group' number. Here is expected output:

    name    group   duplicate
0   bob     1       True
1   rob     4       False
2   dan     3       True
3   steeve  3       False
4   carl    2       True
5   steeve  3       False
6   dan     2       True
7   carl    1       True
8   bob     5       True

For example above, row 0 has True in duplicate because name is the same as in row 8 and group number is different (1 and 5). Row 3 has False in duplicate because no duplicates exist outside of the same group 3.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
illuminato
  • 1,057
  • 1
  • 11
  • 33

1 Answers1

1

There seems to be something wrong in your example, Perhaps what you want is possible with following code

result = df_test.groupby('name')['group'].transform(lambda x: x.nunique() > 1)
df_test.assign(duplicated=result)

output(df_test.assign(duplicated=result): :

    name    group   duplicated
0   bob     1       True
1   rob     4       False
2   dan     3       True
3   steeve  3       False
4   carl    2       True
5   steeve  3       False
6   dan     2       True
7   carl    1       True
8   bob     5       True
Panda Kim
  • 6,246
  • 2
  • 12
  • @jezrael anyway code and output text are different. I think there are something wrong. and i think he want this – Panda Kim Dec 14 '22 at 06:02