4

This might be a very simple problem but I can not find the solution: I want to add a new column "col_new" with operations depending on group variables like groupIDs or dates. So depending on the groupID the calculation should change.
Example:

   Year  col1  col2
0  2019    10     1
1  2019     4     2
2  2019    25     1
3  2018     3     1
4  2017    56     2
5  2017     3     2


- for Year = 2017: col_new = col1-col2
- for Year = 2018: col_new = col1+col2
- for Year = 2019: col_new = col1*col2
Also I want to wrap this up in a for loop.

year = [2017, 2018, 2019]
for x in year:
    df["new_col]" = ................
  • tried using if-functions <== allways requires an else so it changes all values of the previous iteration
  • using .loc and it works but becomes very hard to handle with long and complex conditions
  • tried setting index for column Year. This is easy doing but then I am stuck.
import pandas as pd
import numpy as np

d = {'Year': [2019, 2019, 2019, 2018, 2017, 2017],
     'col1': [10, 4, 25, 3, 56, 3],
     'col2': [1, 2, 1, 1, 2, 2]}
df = pd.DataFrame(data=d) #the example dataframe
df = df.set_index("Year")
print(df)
      col1  col2
Year            
2019    10     1
2019     4     2
2019    25     1
2018     3     1
2017    56     2
2017     3     2

Now I need something like:
- if 2017 then col1+col2
- if 2018 then col1-col2
- if 2019 then col1*col2

Martin Flower
  • 105
  • 3
  • 11

3 Answers3

5

dict of operators

from operator import sub, add, mul

op = {2019: mul, 2018: add, 2017: sub}

df.assign(new_col=[op[t.Year](t.col1, t.col2) for t in df.itertuples()])

   Year  col1  col2  new_col
0  2019    10     1       10
1  2019     4     2        8
2  2019    25     1       25
3  2018     3     1        4
4  2017    56     2       54
5  2017     3     2        1

If Year is in the index

df.assign(new_col=[op[t.Index](t.col1, t.col2) for t in df.itertuples()])

      col1  col2  new_col
Year                     
2019    10     1       10
2019     4     2        8
2019    25     1       25
2018     3     1        4
2017    56     2       54
2017     3     2        1
piRSquared
  • 285,575
  • 57
  • 475
  • 624
2

You can use numpy.select

cond = [df.index == 2017, df.index == 2018, df.index == 2019]
choice = [df.col1+df.col2, df.col1-df.col2, df.col1*df.col2]
df['new'] = np.select(cond, choice)



       col1 col2    new
Year            
2019    10  1       10
2019    4   2       8
2019    25  1       25
2018    3   1       2
2017    56  2       58
2017    3   2       5
Vaishali
  • 37,545
  • 5
  • 58
  • 86
  • 1
    Great. This also works with compex operations and remains a good readability: ```choice = [df.groupby("col2")["col1"].transform(sum), df.col1-df.col2, df.col1*df.col2]``` – Martin Flower Sep 03 '19 at 17:56
1

You can use Pandas apply function. Notice that I commented the line that you set the Year as the index.

import pandas as pd
import numpy as np

d = {'Year': [2019, 2019, 2019, 2018, 2017, 2017],
     'col1': [10, 4, 25, 3, 56, 3],
     'col2': [1, 2, 1, 1, 2, 2]}

df = pd.DataFrame(data=d) #the example dataframe
#df = df.set_index("Year")
#print(df)

df['new_col'] = df.apply(check, axis=1)
df


def check(row):

    if row[0] == 2017:
        return row[1] - row[2]
    elif row[0] == 2018:
        return row[1] + row[2]
    elif row[0] == 2019:
        return row[1] * row[2] 

Result :

    Year    col1    col2    new_col
0   2019    10       1      10
1   2019    4        2      8
2   2019    25       1      25
3   2018    3        1      4
4   2017    56       2      54
5   2017    3        2      1
J.K
  • 1,178
  • 10
  • 13