I want to simulate data using patsy
. Specifically I want to specify a model from variables which I randomly generate and return the outcome variable (y).
Let's take the following model as an example (just for illustrative purposes):
y = sales_base + is_rainy*is_sunday
Here is my code to generate some fake data:
import pandas as pd
import numpy as np
from patsy import dmatrices, dmatrix
n = 21
data = {"base_sales":np.random.randint(1,20, n),
"weekday": np.repeat(range(1,8),3 ),
"rainy": np.random.choice([1, 0], p=[0.1, 0.9], size=n)}
df = pd.DataFrame(data)
df.head()
I then run the following model:
y, X = dmatrices("y ~ base_sales + C(weekday):rainy", data=df)
The code gives the following error message:
PatsyError: Error evaluating factor: NameError: name 'y' is not defined y ~ base_sales + C(weekday):rainy
Question 1: how do I specify the model in order to generate y
?
Question 2: is it possible to do something like C(weekday=7) in order to specify which specific value of the categorical variable I want to use for the interaction effect. I know I could juste create an intermediate column is_sunday
. However, I prefer to avoid this if possible.
Thanks for your help!