-3

I have a dataframe

id  author  publication_year    article_years
1   John Doe    2000            21
1   John Doe    2010            11
2   John Foo    2015            6
2   John Foo    1980            31
3   John Lee    2020            1
3   John Lee    2019            2

I want to create a new column - activity_years, where the max value from article_years will be counted as the total years of activity. Basically, if the author published his article in 1980 for the first time, his activity is 31 since his first publication

Expected output

id  author  publication_year    article_years activity_years
1   John Doe    2000            21             21
1   John Doe    2010            11             21
2   John Foo    2015            6              31
2   John Foo    1980            31             31
3   John Lee    2020            1              2
3   John Lee    2019            2              2
Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63

2 Answers2

0
df['activity_years'] = df.groupby('author')['article_years'].transform(max)

Output:

>>> df
   id    author  publication_year  article_years  activity_years
0   1  John Doe              2000             21              21
1   1  John Doe              2010             11              21
2   2  John Foo              2015              6              31
3   2  John Foo              1980             31              31
4   3  John Lee              2020              1               2
5   3  John Lee              2019              2               2
-1

Try the next code for generation:
df['activity_years'] = df.groupby('id')['article_years'].transform(max)

uaBArt
  • 399
  • 1
  • 8
  • 2
    Dont answer duplicated post. https://stackoverflow.com/questions/35640364/python-pandas-max-value-in-a-group-as-a-new-column – Wilian Dec 09 '21 at 18:32