0

I am trying to find the unique number of items in each 'group' of ID's. So in the code below I am trying to find the unique number of demographics (A, B, C) for each value of id_match (101, 201, 26).

tst = pd.DataFrame({'demographic' : ['A', 'B', 'B', 'A', 'C', 'C'],
            'id_match' : ['101', '101', '201', '201', '26', '26']})

tst['num_unq'] = tst.groupby('demographic')['id_match'].nunique()

Expected output

  demographic  id_match  num_unq
1     A          101        2
2     B          101        2
3     B          201        2
4     A          201        2
5     C          26         1
6     C          26         1

However instead of the expected output i simply get a columns of NaN's. Does anyone know why this happens and also an alternative method?

Thanks J

JDraper
  • 349
  • 2
  • 11

1 Answers1

1

Use transform:

tst = pd.DataFrame({'demographic' : ['A', 'B', 'B', 'A', 'C', 'C'],
            'id_match' : ['101', '101', '201', '201', '26', '26']})

tst['num_unq'] = tst.groupby('demographic')['id_match'].transform('nunique')

print(tst)

Output

  demographic id_match  num_unq
0           A      101        2
1           B      101        2
2           B      201        2
3           A      201        2
4           C       26        1
5           C       26        1
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76