A Python library for describing statistical models and building design matrices, aimed at bringing the convenience of R “formulas” to Python.
Questions tagged [patsy]
113 questions
1
vote
0 answers
How to write formulas for linear mixed effects models in Python (Statsmodels)?
Bear with me as I'm new to this level of statistics and to Python. I've read all the documents from statsmodels and patsy but still have doubts.
I am trying to analyse longitudinal data using statsmodels MixedLM. Simplified a bit, I have 5…

EMG
- 11
- 3
1
vote
0 answers
How to create a constrained regression in statsmodels?
I need to add a constrain to my linear regression below so that y_pred [i] >= y_pred [i+1]
from patsy import dmatrix
import statsmodels.api as sm
xknot = dmatrix("bs(x, knots=(0.0001,1,2,4,6,8,9.3), degree=3, include_intercept=False)",
…

Fahad Ward
- 11
- 2
1
vote
1 answer
Adding Regularization of Coefficients to Statsmodels (or Patsy)
Given that I have the following patsy formula,
'y ~ a + b + c'
and pass it to statsmodels.ols, how can a add a regularization term to the regression coefficients?
In this case, I wish to create my own penalisation function, not simply use ridge,…

Little Bobby Tables
- 4,466
- 4
- 29
- 46
1
vote
1 answer
How can I turn a list of column names into a patsy formula string?
I have a list of pandas column names (consisting of all dummy variables) that I would like to turn into a formula string to copy and paste for statsmodels.
Is there a way to programmatically do this?
Example code
list = ['yrs_owned_model_28',…

Jordan
- 1,415
- 3
- 18
- 44
1
vote
1 answer
Multiple Linear Regression with Python statsmodel
In R, it is possible to execute multiple linear regression like
temp = lm(log(volume_1[11:62])~log(price_1[11:62])+log(volume_1[10:61]))
In Python, it is possible to execute multiple linear regression with
R style formula so I thought the…

Park Dongyeon
- 15
- 3
1
vote
1 answer
Replicate Scipy's RegressionResults.predict functionality
Here's my sample program:
import numpy as np
import pandas as pd
import statsmodels
from statsmodels.formula.api import ols
df = pd.DataFrame({"z": [1,1,1,2,2,2,3,3,3],
"x":[0,1,2,0,1,2,0,1,2],
…

rndeon
- 125
- 7
1
vote
1 answer
Stop patsy dmatrix from dropping NaN rows
I would like use patsy's dmatrix function to generate a design matrix in which rows with NaN values are preserved. For example, the following code would return a design matrix with four rows, which is what we would normally want. However, in this…

Abiel
- 5,251
- 9
- 54
- 74
1
vote
0 answers
Statsmodels with Patsy: keep the order of inputed string
Pasty which is nicely integrated in Statsmodels allows to write R-style formulas based on string.
import statsmodels.formula.api as smf
res = smf.OLS.from_formula("Wealth ~ Age + Income + Happy", data=df).fit()
Print res.summary()
This will…

Adrien Pacifico
- 1,649
- 1
- 15
- 33
1
vote
2 answers
build design matrix python
Suppose I have a RxC contingency table. This means there are R rows and C columns. I want a matrix, X, of dimension RC × (R + C − 2) that contains the R − 1 “main effects” for the rows
and the C − 1 “main effects” for the columns.For example, if you…

iwtbid
- 85
- 4
- 9
1
vote
1 answer
formatting design matrix for regression
I am given a test set without the response variable. I have already built the model and need to predict the response variable in the testing set.
I am having trouble formatting the test design matrix so that it would be compatible.
I am using…

anticavity123
- 111
- 1
- 9
1
vote
2 answers
Creating dummy variable using pandas or statsmodel for interaction of two columns
I have a data frame like this:
Index ID Industry years_spend asset
6646 892 4 4 144.977037
2347 315 10 8 137.749138
7342 985 1 5 104.310217
137 18 5 5 …

Mehdi
- 1,260
- 2
- 16
- 36
1
vote
1 answer
python patsy intercept term in cubic splines
I'm trying to understand cubic spline generation in patsy library of python. As far as I can see from the output of
import numpy as np
from patsy import dmatrix
x = np.linspace(0., 1., 100)
y1 = dmatrix("bs(x, df=6, degree=3,…

hovnatan
- 1,331
- 10
- 23
1
vote
1 answer
How to create all possible combinations of formulas using Patsy for model selection?
I am currently using Python's Patsy module to create matrix inputs for my model. For example, a formula I might use is
'Survived ~ C(Pclass) + C(Sex) + C(honor) + C(tix) + Age + SibSp + ParCh + Fare + Embarked + vowel + middle + C(Title)'
However,…

Naomi
- 93
- 2
- 9
1
vote
0 answers
statsmodels patsy hypothesis testing
Not sure where this belongs so asking this in crossvalidated also. I am running the following regression:
from patsy import dmatrices
import statsmodels.api as sm
y, X = dmatrices('M ~ I(4.8*(Q**0.8)) ', data=DF, return_type='dataframe')
res =…

dayum
- 1,073
- 15
- 31
1
vote
1 answer
Parsing a Pandas dataframe with an unknown number of columns for use in statsmodels.api
I would like to create a generic script to perform linear regressions on multiple data sets. Each data set will have the same y-variable called "SM" and an unknown number of x-variables. I have been able to do this successfully if I know exactly…

keirasan
- 365
- 2
- 6