Questions tagged [statsmodels]

Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests.

Homepage: http://www.statsmodels.org/

An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator. Features include:

  • Linear regression models
  • Generalized linear models
  • Discrete choice models
  • Robust linear models
  • Many models and functions for time series analysis
  • Nonparametric estimators
  • A collection of datasets for examples
  • A wide range of statistical tests
  • Input-output tools for producing tables in a number of formats (Text, LaTex, HTML) and for reading Stata files into NumPy and Pandas.
  • Plotting functions
  • Extensive unit tests to ensure correctness of results
  • Many more models and extensions in development
2841 questions
40
votes
9 answers

Converting statsmodels summary object to Pandas Dataframe

I am doing multiple linear regression with statsmodels.formula.api (ver 0.9.0) on Windows 10. After fitting the model and getting the summary with following lines i get summary in summary object format. X_opt = X[:, [0,1,2,3]] regressor_OLS =…
Sagun Kayastha
  • 513
  • 1
  • 4
  • 5
40
votes
7 answers

ImportError: No module named statsmodels

I downloaded the StatsModels source from this location. Then untarred to /usr/local/lib/python2.7/dist-packages and per this documentation, did this sudo python setup.py install It installed but when I try to import import statsmodels.api as sm I…
Stripers247
  • 2,265
  • 11
  • 38
  • 40
38
votes
1 answer

ANOVA in python using pandas dataframe with statsmodels or scipy?

I want to use the Pandas dataframe to breakdown the variance in one variable. For example, if I have a column called 'Degrees', and I have this indexed for various dates, cities, and night vs. day, I want to find out what fraction of the variation…
wolfsatthedoor
  • 7,163
  • 18
  • 46
  • 90
37
votes
4 answers

Using statsmodel estimations with scikit-learn cross validation, is it possible?

I posted this question to Cross Validated forum and later realized may be this would find appropriate audience in stackoverlfow instead. I am looking for a way I can use the fit object (result) ontained from python statsmodel to feed into…
CARTman
  • 717
  • 1
  • 6
  • 14
35
votes
2 answers

Pandas rolling regression: alternatives to looping

I got good use out of pandas' MovingOLS class (source here) within the deprecated stats/ols module. Unfortunately, it was gutted completely with pandas 0.20. The question of how to run rolling OLS regression in an efficient manner has been asked…
Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
32
votes
3 answers

What are the pitfalls of using Dill to serialise scikit-learn/statsmodels models?

I need to serialise scikit-learn/statsmodels models such that all the dependencies (code + data) are packaged in an artefact and this artefact can be used to initialise the model and make predictions. Using the pickle module is not an option because…
Nikhil
  • 2,230
  • 6
  • 33
  • 51
32
votes
3 answers

Confidence interval for LOWESS in Python

How would I calculate the confidence intervals for a LOWESS regression in Python? I would like to add these as a shaded region to the LOESS plot created with the following code (other packages than statsmodels are fine as well). import numpy as…
pir
  • 5,513
  • 12
  • 63
  • 101
31
votes
2 answers

Why am I getting "LinAlgError: Singular matrix" from grangercausalitytests?

I am trying to run grangercausalitytests on two time series: import numpy as np import pandas as pd from statsmodels.tsa.stattools import grangercausalitytests n = 1000 ls = np.linspace(0, 2*np.pi, n) df1 = pd.DataFrame(np.sin(ls)) df2 =…
Stefan Falk
  • 23,898
  • 50
  • 191
  • 378
31
votes
3 answers

OLS Regression: Scikit vs. Statsmodels?

Short version: I was using the scikit LinearRegression on some data, but I'm used to p-values so put the data into the statsmodels OLS, and although the R^2 is about the same the variable coefficients are all different by large amounts. This…
Nat Poor
  • 451
  • 1
  • 6
  • 6
30
votes
1 answer

How to silence statsmodels.fit() in python

When I want to fit some model in python, I often use fit() method in statsmodels. And some cases I write a script for automating fitting: import statsmodels.formula.api as smf import pandas as pd df = pd.read_csv('mydata.csv') # contains column x…
keisuke
  • 2,123
  • 4
  • 20
  • 31
28
votes
2 answers

How to plot statsmodels linear regression (OLS) cleanly

Problem Statement: I have some nice data in a pandas dataframe. I'd like to run simple linear regression on it: Using statsmodels, I perform my regression. Now, how do I get my plot? I've tried statsmodels' plot_fit method, but the plot is a little…
Alex Lenail
  • 12,992
  • 10
  • 47
  • 79
28
votes
2 answers

Capturing high multi-collinearity in statsmodels

Say I fit a model in statsmodels mod = smf.ols('dependent ~ first_category + second_category + other', data=df).fit() When I do mod.summary() I may see the following: Warnings: [1] The condition number is large, 1.59e+05. This might indicate that…
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
28
votes
3 answers

statsmodels linear regression - patsy formula to include all predictors in model

Say I have a dataframe (let's call it DF) where y is the dependent variable and x1, x2, x3 are my independent variables. In R I can fit a linear model using the following code, and the . will include all of my independent variables in the model: # R…
Greg
  • 6,791
  • 3
  • 18
  • 20
28
votes
3 answers

Python statistics package: difference between statsmodel and scipy.stats

I need some advice on selecting statistics package for Python, I've done quite some search, but not sure if I get everything right, specifically on the differences between statsmodels and scipy.stats. One thing that I know is those with scikits…
herrfz
  • 4,814
  • 4
  • 26
  • 37
27
votes
3 answers

How to get the P Value in a Variable from OLSResults in Python?

The OLSResults of df2 = pd.read_csv("MultipleRegression.csv") X = df2[['Distance', 'CarrierNum', 'Day', 'DayOfBooking']] Y = df2['Price'] X = add_constant(X) fit = sm.OLS(Y, X).fit() print(fit.summary()) shows the P values of each attribute to only…
Addzy K
  • 715
  • 1
  • 7
  • 11