Questions tagged [patsy]

A Python library for describing statistical models and building design matrices, aimed at bringing the convenience of R “formulas” to Python.

113 questions
1
vote
0 answers

Derivative of patsy dmatrix with respect to a specific variable

Edit: I now have a candidate solution to my question (see toy example below) -- if you can think of something more robust, please let me know. I just found out about python's patsy package for creating design matrices from R-style formulas, and it…
Adrian
  • 3,138
  • 2
  • 28
  • 39
1
vote
1 answer

Test the hypothesis test of Constant Return to scale for a cobb-douglas function:

I use scipy project packages like numpy and pandas + statsmodel for some econometrics work, like regression and now I want a test that show β1+β2=1. My formula is : $ Ln(Q_i) = \beta_0 + \beta_1 Ln(L_i) + \beta_2 Ln(K_i) $ I know in stata I have to…
Mehdi
  • 1,260
  • 2
  • 16
  • 36
1
vote
0 answers

Is this the expected behavior of patsy when building a design matrix of a two-level categorical variable without an intercept?

(patsy v0.4.1, python 3.5.0) I would like to use patsy (ideally through statsmodels) to build a design matrix for regression. The patsy-style formula that I would like to fit is response ~ 0 + category where category is a two-level categorical…
bsmith89
  • 223
  • 2
  • 6
1
vote
0 answers

Error in statsmodels.api OLS predict attribute using complex formula

I am trying to use a OLS regression to predict missing (NAN) values of ustar using know data of wind speed (WS), variation of WS by month, and radiation (Rn) using known values of all the variables just mentioned. All variables within the formula do…
Jason
  • 181
  • 2
  • 14
1
vote
1 answer

Changing dictionary consisting 16k dicts to a Pandas Dataframe

I'm working on a data mining problem for my Master Thesis. I'm using Python for data analysis, but I have no experience with Pandas, which is needed to convert my data to a Dataframe. In order to do Survival Regression with a Python package called…
Maurice Stam
  • 79
  • 3
  • 8
1
vote
0 answers

How to get rid of main effects when coding interaction between categorical variables in patsy?

I have a problem very similar to : Interaction effects in patsy with patsy.dmatrices giving duplicate columns for ":" as with "+" , or "*" except that I have other categorical variables besides the interaction term. My formula is : f = 'VarDep ~ …
1
vote
1 answer

Easily configure categorical variables

I have a categorical variable, let's say cat_var which can assume the following values: cat_var = ["A", "B", "C", "D"] I run a series of regressions and patsy makes it easy to describe a regression: regr= " y ~ x + C(cat_var) I was wondering what…
NoIdeaHowToFixThis
  • 4,484
  • 2
  • 34
  • 69
0
votes
0 answers

What are the main differences between Python and R splines?

I am trying to develop a model using natural cubic splines in Python. I have some background using splines in R but I need to reproduce in Python. In R, this is how I am doing the model: library(splines) formula <- as.formula('y ~ x1 + x2 +…
0
votes
0 answers

How to control for within-subject factor in a mixed model?

I am trying to create mixed model with within-subjects IDs for ANOVA analysis. Here is my code: formula = 'DepVar ~ C(Condition)*C(Passage)*C(Order) + (1|C(Participant))' model = ols(formula, data=anova_df).fit() Data in the 'Participant' column…
0
votes
0 answers

How to programmatically generate all operator combinations of grouped variables (eg for regression analysis) in Python

My problem is similar to this answered question: stackoverflow.com/questions/42660752/how-to-create-all-possible-combinations-of-formulas-using-patsy-for-model-select. The accepted answer to that question uses a nested for loop comprising calls to…
Mike
  • 21
  • 8
0
votes
0 answers

Simulation using patsy: dmatrices and interaction

I want to simulate data using patsy. Specifically I want to specify a model from variables which I randomly generate and return the outcome variable (y). Let's take the following model as an example (just for illustrative purposes): y = sales_base +…
FredMaster
  • 1,211
  • 1
  • 15
  • 35
0
votes
0 answers

Full model can not use a column which has name which is two words

I ran the code below but it only worked when i only used columns that where one word formula_string_indep_vars = ' + '.join(df_cars.drop(columns='Price').columns) #formula_string = 'Price ~ ' + formula_string_indep_vars formula_string = 'Price ~ ' +…
0
votes
1 answer

How to remove features from regression results using bonferroni correction results?

I implemented a regression model using formula= "cost ~ C(state) + group_size + C(homeowner) + car_age + C(car_value) + risk_factor + age_oldest + age_youngest + C(married_couple) + c_previous + duration_previous + C(a) + C(b) + C(c) + C(d) + C(e)…
Chris
  • 353
  • 3
  • 9
0
votes
1 answer

How to string brackets within a dataframe column heading?

My excel sheet has Time(s) as a heading. when I input it into one of my codes in python for two-way anova analysis like so: F1_para1 = 'ROI' F2_para2 = 'Drug' value = 'Time(s)' df['comb'] = df[F1_para1].map(str) + "+" + df[F2_para2].map(str)…
user17304179
0
votes
1 answer

Why is patsy returning 2 columns for my left hand side?

I'm using the patsy python package. I have a boolean dependent (y) variable, and some number of numerical explanatory variables. I'm hoping for patsy to treat my y variable as a categorical variable, and therefore produce a 1-hot encoding of the…
Migwell
  • 18,631
  • 21
  • 91
  • 160