0

I am new to python and trying to calculate a simple linear regression. My model has one dependent variable and one independent variable. I am using linear_model.LinearRegression() from sklearn package. I got an R square value of .16 Then I used import statsmodels.api as sm mod = sm.OLS(Y_train,X_train) and I got an R square of 0.61. below is the code starting from getting data from big query

****Code for linear regression**** 
    train_data_df = pd.read_gbq(query,project_id)
    train_data_df.head()

    X_train = train_data_df.revisit_next_day_rate[:, np.newaxis]
    Y_train = train_data_df.demand_1yr_per_new_member[:, np.newaxis]

#scikit-learn version to get prediction R2
    model_sci = linear_model.LinearRegression()
    model_sci.fit(X_train, Y_train)


    print model_sci.intercept_
    print ('Coefficients: \n', model_sci.coef_)
    print("Residual sum of squares %.2f"
         % np.mean((model_sci.predict(X_train) - Y_train ** 2)))
    print ('Variance score: %.2f' %model_sci.score(X_train, Y_train))
    Y_train_predict = model_sci.predict(X_train)
    print ('R Square', r2_score(Y_train,Y_train_predict) )


****for OLM****

    print Y_train[:3]
    print X_train[:3]
    mod = sm.OLS(Y_train,X_train)
    res = mod.fit()
    print res.summary()

I am very new to this. Trying to understand which linear regression package should i use?

SAM244776
  • 1,375
  • 6
  • 18
  • 26
  • You need to show us what you've actually done, or else how could anyone say what you've done wrong? – juanpa.arrivillaga Jan 05 '17 at 22:34
  • Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation. [Minimal, complete, verifiable example](http://stackoverflow.com/help/mcve) applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. – Prune Jan 05 '17 at 22:42
  • @juanpa.arrivillaga I have edited the question to add the code. – SAM244776 Jan 06 '17 at 00:08
  • So you know, sklearn's `LinearRegression` fits an intercept term by default while `sm.OLS` does not. – Nick Becker Jan 06 '17 at 00:29
  • Is there a way I can make OLS to fit the intercept ? – SAM244776 Jan 06 '17 at 01:02

1 Answers1

4

Found out the difference. It was the intercept. OLS does not take it by default. so by adding below code the answers matched.

X = sm.add_constant(X)
sm.OLS(y,X)
SAM244776
  • 1,375
  • 6
  • 18
  • 26