Pandas: How to calculate new column based on index or groupID?

Question

This might be a very simple problem but I can not find the solution: I want to add a new column "col_new" with operations depending on group variables like groupIDs or dates. So depending on the groupID the calculation should change.
Example:

   Year  col1  col2
0  2019    10     1
1  2019     4     2
2  2019    25     1
3  2018     3     1
4  2017    56     2
5  2017     3     2

- for Year = 2017: col_new = col1-col2
- for Year = 2018: col_new = col1+col2
- for Year = 2019: col_new = col1*col2
Also I want to wrap this up in a for loop.

year = [2017, 2018, 2019]
for x in year:
    df["new_col]" = ................

tried using if-functions <== allways requires an else so it changes all values of the previous iteration
using .loc and it works but becomes very hard to handle with long and complex conditions
tried setting index for column Year. This is easy doing but then I am stuck.

import pandas as pd
import numpy as np

d = {'Year': [2019, 2019, 2019, 2018, 2017, 2017],
     'col1': [10, 4, 25, 3, 56, 3],
     'col2': [1, 2, 1, 1, 2, 2]}
df = pd.DataFrame(data=d) #the example dataframe
df = df.set_index("Year")
print(df)

      col1  col2
Year            
2019    10     1
2019     4     2
2019    25     1
2018     3     1
2017    56     2
2017     3     2

Now I need something like:
- if 2017 then col1+col2
- if 2018 then col1-col2
- if 2019 then col1*col2

score 5 · Answer 1 · answered Sep 03 '19 at 17:27

`dict` of operators

from operator import sub, add, mul

op = {2019: mul, 2018: add, 2017: sub}

df.assign(new_col=[op[t.Year](t.col1, t.col2) for t in df.itertuples()])

   Year  col1  col2  new_col
0  2019    10     1       10
1  2019     4     2        8
2  2019    25     1       25
3  2018     3     1        4
4  2017    56     2       54
5  2017     3     2        1

If Year is in the index

df.assign(new_col=[op[t.Index](t.col1, t.col2) for t in df.itertuples()])

      col1  col2  new_col
Year                     
2019    10     1       10
2019     4     2        8
2019    25     1       25
2018     3     1        4
2017    56     2       54
2017     3     2        1

Love the use of the operator library here. Very clever – Patrick H Sep 03 '19 at 17:42 — Patrick H, Sep 03 '19 at 17:42

score 2 · Accepted Answer · answered Sep 03 '19 at 17:29

2

You can use numpy.select

cond = [df.index == 2017, df.index == 2018, df.index == 2019]
choice = [df.col1+df.col2, df.col1-df.col2, df.col1*df.col2]
df['new'] = np.select(cond, choice)



       col1 col2    new
Year            
2019    10  1       10
2019    4   2       8
2019    25  1       25
2018    3   1       2
2017    56  2       58
2017    3   2       5

answered Sep 03 '19 at 17:29

Vaishali

37,545
5
58
86

1

Great. This also works with compex operations and remains a good readability: ```choice = [df.groupby("col2")["col1"].transform(sum), df.col1-df.col2, df.col1*df.col2]``` – Martin Flower Sep 03 '19 at 17:56

score 1 · Answer 3 · answered Sep 03 '19 at 18:40

You can use Pandas apply function. Notice that I commented the line that you set the Year as the index.

import pandas as pd
import numpy as np

d = {'Year': [2019, 2019, 2019, 2018, 2017, 2017],
     'col1': [10, 4, 25, 3, 56, 3],
     'col2': [1, 2, 1, 1, 2, 2]}

df = pd.DataFrame(data=d) #the example dataframe
#df = df.set_index("Year")
#print(df)

df['new_col'] = df.apply(check, axis=1)
df


def check(row):

    if row[0] == 2017:
        return row[1] - row[2]
    elif row[0] == 2018:
        return row[1] + row[2]
    elif row[0] == 2019:
        return row[1] * row[2]

Result :

    Year    col1    col2    new_col
0   2019    10       1      10
1   2019    4        2      8
2   2019    25       1      25
3   2018    3        1      4
4   2017    56       2      54
5   2017    3        2      1

Pandas: How to calculate new column based on index or groupID?

3 Answers3

dict of operators

`dict` of operators