0

I am having difficulty adding a regression line (the one which statsmodel OLS is based on) on to scatter plot. Note that with seaborn's lmplot, I can get a line (see example), but I would like to use the exact one coming from statsmodel OLS for total consistency.

How can I adjust code below to add in the regression line into the first scatter plot?

import statsmodels.regression.linear_model as sm
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(0)

data = {'Xvalue': range(20, 30), 'Yvalue': np.random.randint(low=10, high=100, size=10)}

data = pd.DataFrame(data)

X = data[['Xvalue']]
Y = data['Yvalue']
model2 = sm.OLS(Y,sm.add_constant(X), data=data)
model_fit = model2.fit()
print(model_fit.summary())

#Plot
data.plot(kind='scatter', x='Xvalue', y='Yvalue')

#Seaborn
sns.lmplot(x='Xvalue', y='Yvalue', data=data)

Scatter plot (trying to work out how to add in the statsmodel OLS regression line

Scatter plot (trying to work out how to add in the statsmodel OLS regression line

seaborn lmplot with its regression line (trying to mimic this)

seaborn lmplot with its regression line (trying to mimic this)

halfer
  • 19,824
  • 17
  • 99
  • 186
dsnOwhiskey
  • 141
  • 1
  • 12

1 Answers1

1

Thanks to the link from @busybear, it now works!

import statsmodels.regression.linear_model as sm
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(0)

data = {'Xvalue': range(20, 30), 'Yvalue': np.random.randint(low=10, high=100, size=10)}

data = pd.DataFrame(data)

X = data[['Xvalue']]
Y = data['Yvalue']
model = sm.OLS(data['Yvalue'], sm.add_constant(data['Xvalue']))
model_fit = model.fit()
p = model_fit.params
print(model_fit.summary())


#Plot
p
x = np.arange(0,40)
ax = data.plot(kind='scatter', x='Xvalue', y='Yvalue')
ax.plot(x, p.const + p.Xvalue * x)
ax.set_xlim([0,30])

#Seaborn
sns.lmplot(x='Xvalue', y='Yvalue', data=data)

enter image description here

dsnOwhiskey
  • 141
  • 1
  • 12