3

I have a dataframe which represents features of a linear regression model.

df1 = pd.DataFrame({'yyyyww': ['2022-01','2022-02','2022-03', '2022-04','2022-05','2022-06','2022-07','2022-08','2022-09','2022-10'],
                         'feature1': [1000,2000,4000,3000,5000,2000,8000,2000,4000,3000],
                         'feature2': [9000,7000,3000,1000,2000,3000,6000,8000,1000,1000],
                         'feature3': [3000,1000,2000,5000,9000,7000,2000,3000,5000,9000]})

enter image description here

I run the model and calculate the coefficients which produces another dataframe, below.

df2 = pd.DataFrame({'feature': ['feature1','feature2','feature3'],
                         'coefficient': [-1,2,0.5]})

enter image description here

I then want to produce a third dataframe where the contents are the product of the values from df1 and the corresponding coefficients from df2. Desired output below.

df3 = pd.DataFrame({'yyyyww': ['2022-01','2022-02','2022-03', '2022-04','2022-05','2022-06','2022-07','2022-08','2022-09','2022-10'],
                         'feature1': [-1000,-2000,-4000,-3000,-5000,-2000,-8000,-2000,-4000,-3000],
                         'feature2': [18000,14000,6000,2000,4000,6000,12000,16000,2000,2000],
                         'feature3': [1500,500,1000,2500,4500,3500,1000,1500,2500,4500]})

enter image description here

I have tried to achieve this using mul and multiply in the following manner, however this does not produce the desired result.

features = [['feature1', 'feature2', 'feature3']]

results = pd.DataFrame()

for cols in features:
    results[cols] = df1[cols] 

results = df1.mul(df2['coefficient'], axis =0)
results
jimiclapton
  • 775
  • 3
  • 14
  • 42

3 Answers3

3

Try this using pandas intrinsic data alignment tenet:

df1.set_index('yyyyww').mul(df2.set_index('feature')['coefficient'])

Output:

         feature1  feature2  feature3
yyyyww                               
2022-01   -1000.0   18000.0    1500.0
2022-02   -2000.0   14000.0     500.0
2022-03   -4000.0    6000.0    1000.0
2022-04   -3000.0    2000.0    2500.0
2022-05   -5000.0    4000.0    4500.0
2022-06   -2000.0    6000.0    3500.0
2022-07   -8000.0   12000.0    1000.0
2022-08   -2000.0   16000.0    1500.0
2022-09   -4000.0    2000.0    2500.0
2022-10   -3000.0    2000.0    4500.0
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
1

One possible solution:

coeff = df2.set_index("feature")["coefficient"].to_dict()

for c in df1.loc[:, "feature1":]:
    df1[c] *= coeff[c]

print(df1)

Prints:

    yyyyww  feature1  feature2  feature3
0  2022-01   -1000.0   18000.0    1500.0
1  2022-02   -2000.0   14000.0     500.0
2  2022-03   -4000.0    6000.0    1000.0
3  2022-04   -3000.0    2000.0    2500.0
4  2022-05   -5000.0    4000.0    4500.0
5  2022-06   -2000.0    6000.0    3500.0
6  2022-07   -8000.0   12000.0    1000.0
7  2022-08   -2000.0   16000.0    1500.0
8  2022-09   -4000.0    2000.0    2500.0
9  2022-10   -3000.0    2000.0    4500.0
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
1

You can indeed use index alignment. If you want to update in place use:

df1[df2['feature']] *= df2.set_index('feature')['coefficient']

If you want a new DataFrame, I'd use @Scott's approach, as it's often nicer to have the dates as index. Nevertheless, here is another option:

df3 = (df1
    .mul(df2.set_index('feature')['coefficient'])
    .combine_first(df1)[df1.columns]
 )

Output:

    yyyyww  feature1  feature2  feature3
0  2022-01   -1000.0   18000.0    1500.0
1  2022-02   -2000.0   14000.0     500.0
2  2022-03   -4000.0    6000.0    1000.0
3  2022-04   -3000.0    2000.0    2500.0
4  2022-05   -5000.0    4000.0    4500.0
5  2022-06   -2000.0    6000.0    3500.0
6  2022-07   -8000.0   12000.0    1000.0
7  2022-08   -2000.0   16000.0    1500.0
8  2022-09   -4000.0    2000.0    2500.0
9  2022-10   -3000.0    2000.0    4500.0
mozway
  • 194,879
  • 13
  • 39
  • 75