A Python library for describing statistical models and building design matrices, aimed at bringing the convenience of R “formulas” to Python.
Questions tagged [patsy]
113 questions
1
vote
0 answers
Derivative of patsy dmatrix with respect to a specific variable
Edit: I now have a candidate solution to my question (see toy example below) -- if you can think of something more robust, please let me know.
I just found out about python's patsy package for creating design matrices from R-style formulas, and it…

Adrian
- 3,138
- 2
- 28
- 39
1
vote
1 answer
Test the hypothesis test of Constant Return to scale for a cobb-douglas function:
I use scipy project packages like numpy and pandas + statsmodel for some econometrics work, like regression and now I want a test that show β1+β2=1.
My formula is : $ Ln(Q_i) = \beta_0 + \beta_1 Ln(L_i) + \beta_2 Ln(K_i) $
I know in stata I have to…

Mehdi
- 1,260
- 2
- 16
- 36
1
vote
0 answers
Is this the expected behavior of patsy when building a design matrix of a two-level categorical variable without an intercept?
(patsy v0.4.1, python 3.5.0)
I would like to use patsy (ideally through statsmodels) to build a design matrix for regression.
The patsy-style formula that I would like to fit is
response ~ 0 + category
where category is a two-level categorical…

bsmith89
- 223
- 2
- 6
1
vote
0 answers
Error in statsmodels.api OLS predict attribute using complex formula
I am trying to use a OLS regression to predict missing (NAN) values of ustar using know data of wind speed (WS), variation of WS by month, and radiation (Rn) using known values of all the variables just mentioned. All variables within the formula do…

Jason
- 181
- 2
- 14
1
vote
1 answer
Changing dictionary consisting 16k dicts to a Pandas Dataframe
I'm working on a data mining problem for my Master Thesis. I'm using Python for data analysis, but I have no experience with Pandas, which is needed to convert my data to a Dataframe. In order to do Survival Regression with a Python package called…

Maurice Stam
- 79
- 3
- 8
1
vote
0 answers
How to get rid of main effects when coding interaction between categorical variables in patsy?
I have a problem very similar to :
Interaction effects in patsy with patsy.dmatrices giving duplicate columns for ":" as with "+" , or "*"
except that I have other categorical variables besides the interaction term. My formula is :
f = 'VarDep ~ …

Georges Casamatta
- 93
- 1
- 1
- 6
1
vote
1 answer
Easily configure categorical variables
I have a categorical variable, let's say cat_var which can assume the following values: cat_var = ["A", "B", "C", "D"]
I run a series of regressions and patsy makes it easy to describe a regression: regr= " y ~ x + C(cat_var)
I was wondering what…

NoIdeaHowToFixThis
- 4,484
- 2
- 34
- 69
0
votes
0 answers
What are the main differences between Python and R splines?
I am trying to develop a model using natural cubic splines in Python. I have some background using splines in R but I need to reproduce in Python.
In R, this is how I am doing the model:
library(splines)
formula <- as.formula('y ~ x1 + x2 +…

Julia Moore
- 3
- 2
0
votes
0 answers
How to control for within-subject factor in a mixed model?
I am trying to create mixed model with within-subjects IDs for ANOVA analysis. Here is my code:
formula = 'DepVar ~ C(Condition)*C(Passage)*C(Order) + (1|C(Participant))'
model = ols(formula, data=anova_df).fit()
Data in the 'Participant' column…

user1911342
- 1
- 1
0
votes
0 answers
How to programmatically generate all operator combinations of grouped variables (eg for regression analysis) in Python
My problem is similar to this answered question: stackoverflow.com/questions/42660752/how-to-create-all-possible-combinations-of-formulas-using-patsy-for-model-select. The accepted answer to that question uses a nested for loop comprising calls to…

Mike
- 21
- 8
0
votes
0 answers
Simulation using patsy: dmatrices and interaction
I want to simulate data using patsy. Specifically I want to specify a model from variables which I randomly generate and return the outcome variable (y).
Let's take the following model as an example (just for illustrative purposes):
y = sales_base +…

FredMaster
- 1,211
- 1
- 15
- 35
0
votes
0 answers
Full model can not use a column which has name which is two words
I ran the code below but it only worked when i only used columns that where one word
formula_string_indep_vars = ' + '.join(df_cars.drop(columns='Price').columns)
#formula_string = 'Price ~ ' + formula_string_indep_vars
formula_string = 'Price ~ ' +…

charles sarah
- 1
- 2
0
votes
1 answer
How to remove features from regression results using bonferroni correction results?
I implemented a regression model using
formula= "cost ~ C(state) + group_size + C(homeowner) + car_age + C(car_value) +
risk_factor + age_oldest + age_youngest + C(married_couple) + c_previous +
duration_previous + C(a) + C(b) + C(c) + C(d) + C(e)…

Chris
- 353
- 3
- 9
0
votes
1 answer
How to string brackets within a dataframe column heading?
My excel sheet has Time(s) as a heading.
when I input it into one of my codes in python for two-way anova analysis like so:
F1_para1 = 'ROI'
F2_para2 = 'Drug'
value = 'Time(s)'
df['comb'] = df[F1_para1].map(str) + "+" + df[F2_para2].map(str)…
user17304179
0
votes
1 answer
Why is patsy returning 2 columns for my left hand side?
I'm using the patsy python package. I have a boolean dependent (y) variable, and some number of numerical explanatory variables. I'm hoping for patsy to treat my y variable as a categorical variable, and therefore produce a 1-hot encoding of the…

Migwell
- 18,631
- 21
- 91
- 160