0

I want to simulate data using patsy. Specifically I want to specify a model from variables which I randomly generate and return the outcome variable (y).

Let's take the following model as an example (just for illustrative purposes):

y = sales_base + is_rainy*is_sunday

Here is my code to generate some fake data:

import pandas as pd
import numpy as np
from patsy import dmatrices, dmatrix

n = 21
data = {"base_sales":np.random.randint(1,20, n), 
        "weekday": np.repeat(range(1,8),3 ), 
        "rainy": np.random.choice([1, 0], p=[0.1, 0.9], size=n)}
df = pd.DataFrame(data)
df.head()

I then run the following model:

y, X = dmatrices("y ~ base_sales + C(weekday):rainy", data=df)

The code gives the following error message:

PatsyError: Error evaluating factor: NameError: name 'y' is not defined y ~ base_sales + C(weekday):rainy

Question 1: how do I specify the model in order to generate y?

Question 2: is it possible to do something like C(weekday=7) in order to specify which specific value of the categorical variable I want to use for the interaction effect. I know I could juste create an intermediate column is_sunday. However, I prefer to avoid this if possible.

Thanks for your help!

FredMaster
  • 1,211
  • 1
  • 15
  • 35

0 Answers0