1

I have created a script to plot a histogram of a NO2 vs Temperature residuals in a dataframe called nighttime.

The histogram shows the normal distribution of the residuals from a regression line somewhere else in the python script.

I am struggling to find a way to plot a bell curve over the histogram like this example :

Plot Normal distribution with Matplotlib

How can I get a fitting normal distribution for my residual histogram?

plt.suptitle('NO2 and Temperature Residuals night-time', fontsize=20)

WSx_rm = nighttime['Temperature']                                        
WSx_rm = sm.add_constant(WSx_rm)   
NO2_WS_RM_mod = sm.OLS(nighttime.NO2, WSx_rm, missing = 'drop').fit() 
NO2_WS_RM_mod_sr = (NO2_WS_RM_mod.resid / np.std(NO2_WS_RM_mod.resid)) 
#Histogram of residuals
ax = plt.hist(NO2_WS_RM_mod.resid)
plt.xlim(-40,50)
plt.xlabel('Residuals')
plt.show
Community
  • 1
  • 1
Isabella
  • 21
  • 1
  • 7

2 Answers2

1

You can exploit the methods from seaborn library for plotting the distribution with the bell curve. The residual variable is not clear to me in the example you have provided. You may see the code snippet below just for your reference.

# y here is an arbitrary target variable for explaining this example    
residuals = y_actual - y_predicted 

import seaborn as sns
sns.distplot(residuals, bins = 10) # you may select the no. of bins
plt.title('Error Terms', fontsize=20)           
plt.xlabel('Residuals', fontsize = 15)     
plt.show()
Dharman
  • 30,962
  • 25
  • 85
  • 135
amit haldar
  • 129
  • 1
  • 10
0

Does the following work for you? (using some adapted code from the link you gave)

import scipy.stats as stats

plt.suptitle('NO2 and Temperature Residuals night-time', fontsize=20)

WSx_rm = nighttime['Temperature']                                        
WSx_rm = sm.add_constant(WSx_rm)   
NO2_WS_RM_mod = sm.OLS(nighttime.NO2, WSx_rm, missing = 'drop').fit() 
NO2_WS_RM_mod_sr = (NO2_WS_RM_mod.resid / np.std(NO2_WS_RM_mod.resid)) 
#Histogram of residuals
ax = plt.hist(NO2_WS_RM_mod.resid)
plt.xlim(-40,50)
plt.xlabel('Residuals')

# New Code: Draw fitted normal distribution
residuals = sorted(NO2_WS_RM_mod.resid) # Just in case it isn't sorted
normal_distribution = stats.norm.pdf(residuals, np.mean(residuals), np.std(residuals))
plt.plot(residuals, normal_distribution)

plt.show
Robin Spiess
  • 1,480
  • 9
  • 17