Computing MSE per row in pandas dataframe

Question

I have the below dataframe with many columns- 2016_x, 2016_y, 2017_x, etc, where x represents my actual values and y represents the forecast values.

How would I compute the mean squared error (MSE) row-wise to see it for different fruits. Here is the below code-

import pandas as pd
s={'Fruits':['Apple','Mango'],'2016_x':[2,3],'2017_x':[4,5],'2018_x':[12,13],'2016_y':[3,4],'2017_y':[3,4],'2018_y':[12,13]}
p=pd.DataFrame(data=s)

This is how the dataframe looks like-

The desired output should show MSE of Apple and Mango, i.e. row by row. MSE should take difference of x and y values of the year. Basically, I need the total MSE for Apple and Mango respectively.

I know MSE can be calculated as-

MSE = np.mean((p['x'] - p['y'])**2, axis=1)

But how would I calculate for this type of data frame?

Could you detail it more with an example? Do you mean you would like to get MSE for Apple for year 2016,2017,2018? — coldy, Mar 07 '22 at 20:25
@cody Thanks for letting me know. Its the a total MSE of Apple, not every year. — John, Mar 07 '22 at 21:03
This would be easier if your data were organized with one row for predicted and one row for actual. Do you have control over that? — Tim Roberts, Mar 07 '22 at 21:08

tdy · Accepted Answer · 2022-03-07T21:45:33.913

1

Set the index to Fruits and transform the columns into a MultiIndex of (x/y, year):

p = p.set_index('Fruits')
p.columns = p.columns.str.split('_', expand=True)
p = p.swaplevel(axis=1) 

#         x               y          
#         2016 2017 2018  2016 2017 2018
# Fruits                                
# Apple   2    4    12    3    3    12
# Mango   3    5    13    4    4    13

Then the MSE arithmetic can be vectorized:

mse = p['x'].sub(p['y']).pow(2).mean(axis=1)

# Fruits
# Apple    0.666667
# Mango    0.666667
# dtype: float64

Note that chaining sub and pow is just a cleaner way of applying - and ** on columns:

mse = ((p['x'] - p['y']) ** 2).mean(axis=1)

edited Mar 07 '22 at 21:45

answered Mar 07 '22 at 21:14

tdy

36,675
19
86
83

Sure, makes sense, one quick question, in mse we do have a summation, I am not sure, where is the summation present in the equation? – John Mar 09 '22 at 15:01
1

MSE is a summation divided by the number of observations, i.e., the mean. Here we just use [pandas.DataFrame.mean](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html) to get the mean per row (axis=1) in one step. You _could_ do it in a roundabout way with an explicit sum: `SE = p['x'].sub(p['y']).pow(2); MSE = SE.sum(axis=1).div(SE.columns.size)` – tdy Mar 09 '22 at 18:15

Computing MSE per row in pandas dataframe

1 Answers1