Questions tagged [patsy]

A Python library for describing statistical models and building design matrices, aimed at bringing the convenience of R “formulas” to Python.

113 questions
2
votes
1 answer

ipython notebook and patsy categorical variable (formula)

I had the same error as in this question. What is weird, is that it works (with the answer provided) in an ipython shell, but not in an ipython notebook. But it's related to the C() operator, because without it works (but not as an operator) Same…
jrjc
  • 21,103
  • 9
  • 64
  • 78
2
votes
1 answer

Fetching names from DesignMatrix in patsy

from patsy import * from pandas import * dta = DataFrame([["lo", 1],["hi", 2.4],["lo", 1.2],["lo", 1.4],["very_high",1.8]], columns=["carbs", "score"]) dmatrix("carbs + score", dta) DesignMatrix with shape (5, 4) Intercept carbs[T.lo] …
ekta
  • 1,560
  • 3
  • 28
  • 57
1
vote
1 answer

Python natural spline function cr in patsy only accepts 3 or more degrees of freedom, whereas ns in R accepts 2

I am trying to port this functionality into python > x <- 0:10 > y <- x**2 > lm(y ~ ns(x,df=2)) Such as: import numpy as np import pandas as pd import statsmodels.formula.api as smf x = pd.DataFrame(np.arange(11)) y = x**2 formula="y ~ cr(x, df =…
1
vote
0 answers

Is there a way to compare only particular groups in anova model test using statsmodels in Python similar to SAS's CONTRAST

In SAS there is CONTRAST statement that allows the comparison of only particular groups/means based on the model constructed with all data. In SAS L vector is used for such operations.…
1
vote
1 answer

Python ModuleNotFoundError: No module named 'patsy' when using ggplot

I am attempting to run the following code to plot the explained variance after applying PCA on my dataframe: (ggplot(pcaDF, aes(x = "Principal Components", y = "expl_var")) + geom_line() + geom_point()) However, I keep on getting this error…
sums22
  • 1,793
  • 3
  • 13
  • 25
1
vote
2 answers

Using Variable instead of column name in Statsmodel formula API

I have a variable cols that contain list of column name for my table. Now I want to run an regression on my table by looping through different columns of cols variable. I am trying to use Statsmodel Formula API (Patsy) but am unable to construct a…
Bhavya Budhia
  • 115
  • 1
  • 2
  • 10
1
vote
2 answers

Does python have an analogue to R's splines::ns()

I would like to replicate making this Q matrix in python, but I can't seem to make it happen. min = 0 max = 10 tau = seq(min, max) pDegree = 5 Q <- splines::ns(tau, pDegree) print(Q) Here are some tries in python import numpy as np from patsy…
Alex
  • 2,603
  • 4
  • 40
  • 73
1
vote
0 answers

Using GLM to reproduce built-in regression models in statsmodels

I am currently trying to reproduce a regression model eq. (3) (edit: fixed link) in python using statsmodels. As this model is no part of the standard models provided by statsmodels I clearly have to write it myself using the provided formula…
ICST
  • 21
  • 1
  • 5
1
vote
0 answers

retrieve patsy's levels and encoding of categorical variables when transforming data to a design matrix

When there are categorical variables in the formula, then patsy needs the full original dataset to rebuild the category levels and encoding. After data is transformed to a design matrix, is there a way to retrieve patsy's levels and encoding for…
morfys
  • 2,195
  • 3
  • 28
  • 35
1
vote
1 answer

Removing categories with patsy and statsmodels

I am using statsmodels and patsy for building a logistic regression model. I'll use pseudocode here. Let's assume I have a dataframe containing a categorical variable, say Country, with 200 levels. I have reasons to believe some of them would be…
famargar
  • 3,258
  • 6
  • 28
  • 44
1
vote
0 answers

Converting new dataset into a previous patsy dmatrix form

I have divided my data set into 2 sets, training and test set. I have 6 types of dummy variable in my data. Every time I try to run the model on my training set I get error. This is my code: X = dmatrix('sfdc_tier + poc_image + sub_segment +…
1
vote
1 answer

Is there a way to run GLM.from_formula without the intercept (PyMC3)?

This may be a dumb question but I've searched through pyMC3 docs and forums and can't seem to find the answer. I'm trying to create a linear regression model from a dataset that I know a priori should not have an intercept. Currently my…
1
vote
1 answer

Standardization Result is different between Patsy & Pandas - Python

I found an interesting question and I would love to hear your interpretation. from patsy import dmatrix,demo_data df = pd.DataFrame(demo_data("a", "b", "x1", "x2", "y", "z column")) Patsy_Standarlize_Output = dmatrix("standardize(x2) +…
vae
  • 132
  • 6
1
vote
0 answers

Statsmodels OLS Mismatch between data argument and columns

I am trying to run a linear model on my data using statsmodels. My dataframe looks like the following: 0 Group Age Education 3_0001 190.8 1.0 47 12 3_0002 482.1 1.0 44 16 4_0003 144.1 …
shall
  • 11
  • 1
1
vote
0 answers

Making predictions for a regression model with cubic splines in Python

I'm building a linear regression model where one of the input variables is number of sales. Rather than using the number of sales per day as a linear input, I want to use some form of cubic spline transformation (because it tends to tail off after a…
DB_DS
  • 29
  • 2