0

I have a dataframe like:

   date region code name
0  1      a      1    x
1  2      a      1    y
2  1      b      1    y
3  2      b      1    w
4  1      c      1    y
5  2      c      1    y
6  1      a      2    z
7  2      a      2    z

Each code should have a unique name, however it can change over time. I want to select the last name for each region and code combination. It is important to notice that some codes only appear in certain regions.

   date region code name
0  1      a      1    y   (changed)
1  2      a      1    y
2  1      b      1    w   (changed)
3  2      b      1    w
4  1      c      1    y 
5  2      c      1    y   
6  1      a      2    z
7  2      a      2    z

I tried doing it using some for loops with no success.

justsomeguy
  • 513
  • 4
  • 11
  • defining your time column as a DateTime would help you in sorting your data, then just group by your needed column – Bourhano Dec 09 '22 at 13:57

1 Answers1

1

Use GroupBy.transform with GroupBy.last:

#if necessary sorting
#df = df.sort_values(['region','code','date'])

df['name'] = df.groupby(['region','code'])['name'].transform('last')
print (df)
   date region  code name
0     1      a     1    y
1     2      a     1    y
2     1      b     1    w
3     2      b     1    w
4     1      c     1    y
5     2      c     1    y
6     1      a     2    z
7     2      a     2    z
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252