How can we measure RMSE in Python?

Question

I am doing an experiment using Kalman Filters. I have created a very small time series data ready with three columns formatted as follows. The full dataset is attached here for reproduciability since I can't attach a file on stackoverflow:

csv file

  time        X      Y
 0.040662  1.041667  1
 0.139757  1.760417  2
 0.144357  1.190104  1
 0.145341  1.047526  1
 0.145401  1.011882  1
 0.148465  1.002970  1
 ....      .....     .

I have read the documetation of the Kalman Filter and managed to do a simple linear prediction and here is my code

import matplotlib.pyplot as plt 
from pykalman import KalmanFilter 
import numpy as np
import pandas as pd



df = pd.read_csv('testdata.csv')
print(df)
pd.set_option('use_inf_as_null', True)

df.dropna(inplace=True)


X = df.drop('Y', axis=1)
y = df['Y']



estimated_value= np.array(X)
real_value = np.array(y)

measurements = np.asarray(estimated_value)



kf = KalmanFilter(n_dim_obs=1, n_dim_state=1, 
                  transition_matrices=[1],
                  observation_matrices=[1],
                  initial_state_mean=measurements[0,1], 
                  initial_state_covariance=1,
                  observation_covariance=5,
                  transition_covariance=1)

state_means, state_covariances = kf.filter(measurements[:,1]) 
state_std = np.sqrt(state_covariances[:,0])
print (state_std)
print (state_means)
print (state_covariances)


fig, ax = plt.subplots()
ax.margins(x=0, y=0.05)

plt.plot(measurements[:,0], measurements[:,1], '-r', label='Real Value Input') 
plt.plot(measurements[:,0], state_means, '-b', label='Kalman-Filter') 
plt.legend(loc='best')
ax.set_xlabel("Time")
ax.set_ylabel("Value")
plt.show()

Which gives the following plot as an output

As we can see in the plot, the pattern seems to be captured reasonably well. How can we statistically measure the root-mean-square error (RMSE) (the error distance between the red and blue lines in the plot above)? Any help would be appreciated.

to find RMSE between two lists `x` and `y` you can do `np.sqrt(np.mean((x-y)**2))`. — overfull hbox, Dec 29 '18 at 15:40
are all of the entries in your arrays regular numbers, or are there some `inf` or `NaN`? — overfull hbox, Dec 29 '18 at 15:59
@TylerChen, yes they are regular numbers sir. I have included the small dataset with my post for reproduciability. It is only about 400 rows and it will not take you much time to re-run and check if it works for you. Thanks. — , Dec 29 '18 at 16:03
could you post the two arrays you want to find the RMSE of? I don't have `pykalman` installed. — overfull hbox, Dec 29 '18 at 16:10
Note `X` has three columns. Maybe you want to do it between `x=df['X']` and `y=df['Y']`? But in your plot it isn't this `y`. — overfull hbox, Dec 29 '18 at 16:16
In your plot the blue line is `x` but the red line is `state_means` which came from the filter. — overfull hbox, Dec 29 '18 at 16:29

score 0 · Accepted Answer · answered Dec 29 '18 at 16:55

0

Try this!

from sklearn.metrics import mean_squared_error

mean_squared_error( measurements[:,1], state_means)

answered Dec 29 '18 at 16:55

Venkatachalam

16,288
9
49
77

score 0 · Answer 2 · answered Nov 19 '20 at 10:19

In scikit-learn 0.22.0 you can pass mean_squared_error() the argument squared=False to return the RMSE.

from sklearn.metrics import mean_squared_error
mean_squared_error(y_actual, y_predicted, squared=False)

How can we measure RMSE in Python?

2 Answers2