Questions tagged [patsy]

A Python library for describing statistical models and building design matrices, aimed at bringing the convenience of R “formulas” to Python.

113 questions
1
vote
0 answers

How to write formulas for linear mixed effects models in Python (Statsmodels)?

Bear with me as I'm new to this level of statistics and to Python. I've read all the documents from statsmodels and patsy but still have doubts. I am trying to analyse longitudinal data using statsmodels MixedLM. Simplified a bit, I have 5…
EMG
  • 11
  • 3
1
vote
0 answers

How to create a constrained regression in statsmodels?

I need to add a constrain to my linear regression below so that y_pred [i] >= y_pred [i+1] from patsy import dmatrix import statsmodels.api as sm xknot = dmatrix("bs(x, knots=(0.0001,1,2,4,6,8,9.3), degree=3, include_intercept=False)", …
Fahad Ward
  • 11
  • 2
1
vote
1 answer

Adding Regularization of Coefficients to Statsmodels (or Patsy)

Given that I have the following patsy formula, 'y ~ a + b + c' and pass it to statsmodels.ols, how can a add a regularization term to the regression coefficients? In this case, I wish to create my own penalisation function, not simply use ridge,…
Little Bobby Tables
  • 4,466
  • 4
  • 29
  • 46
1
vote
1 answer

How can I turn a list of column names into a patsy formula string?

I have a list of pandas column names (consisting of all dummy variables) that I would like to turn into a formula string to copy and paste for statsmodels. Is there a way to programmatically do this? Example code list = ['yrs_owned_model_28',…
Jordan
  • 1,415
  • 3
  • 18
  • 44
1
vote
1 answer

Multiple Linear Regression with Python statsmodel

In R, it is possible to execute multiple linear regression like temp = lm(log(volume_1[11:62])~log(price_1[11:62])+log(volume_1[10:61])) In Python, it is possible to execute multiple linear regression with R style formula so I thought the…
1
vote
1 answer

Replicate Scipy's RegressionResults.predict functionality

Here's my sample program: import numpy as np import pandas as pd import statsmodels from statsmodels.formula.api import ols df = pd.DataFrame({"z": [1,1,1,2,2,2,3,3,3], "x":[0,1,2,0,1,2,0,1,2], …
rndeon
  • 125
  • 7
1
vote
1 answer

Stop patsy dmatrix from dropping NaN rows

I would like use patsy's dmatrix function to generate a design matrix in which rows with NaN values are preserved. For example, the following code would return a design matrix with four rows, which is what we would normally want. However, in this…
Abiel
  • 5,251
  • 9
  • 54
  • 74
1
vote
0 answers

Statsmodels with Patsy: keep the order of inputed string

Pasty which is nicely integrated in Statsmodels allows to write R-style formulas based on string. import statsmodels.formula.api as smf res = smf.OLS.from_formula("Wealth ~ Age + Income + Happy", data=df).fit() Print res.summary() This will…
Adrien Pacifico
  • 1,649
  • 1
  • 15
  • 33
1
vote
2 answers

build design matrix python

Suppose I have a RxC contingency table. This means there are R rows and C columns. I want a matrix, X, of dimension RC × (R + C − 2) that contains the R − 1 “main effects” for the rows and the C − 1 “main effects” for the columns.For example, if you…
iwtbid
  • 85
  • 4
  • 9
1
vote
1 answer

formatting design matrix for regression

I am given a test set without the response variable. I have already built the model and need to predict the response variable in the testing set. I am having trouble formatting the test design matrix so that it would be compatible. I am using…
anticavity123
  • 111
  • 1
  • 9
1
vote
2 answers

Creating dummy variable using pandas or statsmodel for interaction of two columns

I have a data frame like this: Index ID Industry years_spend asset 6646 892 4 4 144.977037 2347 315 10 8 137.749138 7342 985 1 5 104.310217 137 18 5 5 …
Mehdi
  • 1,260
  • 2
  • 16
  • 36
1
vote
1 answer

python patsy intercept term in cubic splines

I'm trying to understand cubic spline generation in patsy library of python. As far as I can see from the output of import numpy as np from patsy import dmatrix x = np.linspace(0., 1., 100) y1 = dmatrix("bs(x, df=6, degree=3,…
hovnatan
  • 1,331
  • 10
  • 23
1
vote
1 answer

How to create all possible combinations of formulas using Patsy for model selection?

I am currently using Python's Patsy module to create matrix inputs for my model. For example, a formula I might use is 'Survived ~ C(Pclass) + C(Sex) + C(honor) + C(tix) + Age + SibSp + ParCh + Fare + Embarked + vowel + middle + C(Title)' However,…
Naomi
  • 93
  • 2
  • 9
1
vote
0 answers

statsmodels patsy hypothesis testing

Not sure where this belongs so asking this in crossvalidated also. I am running the following regression: from patsy import dmatrices import statsmodels.api as sm y, X = dmatrices('M ~ I(4.8*(Q**0.8)) ', data=DF, return_type='dataframe') res =…
dayum
  • 1,073
  • 15
  • 31
1
vote
1 answer

Parsing a Pandas dataframe with an unknown number of columns for use in statsmodels.api

I would like to create a generic script to perform linear regressions on multiple data sets. Each data set will have the same y-variable called "SM" and an unknown number of x-variables. I have been able to do this successfully if I know exactly…
keirasan
  • 365
  • 2
  • 6