Bear with me as I'm new to this level of statistics and to Python. I've read all the documents from statsmodels and patsy but still have doubts.
I am trying to analyse longitudinal data using statsmodels MixedLM. Simplified a bit, I have 5 variables, with no collinearity between independent variables:
- Outcome: the dependent variable.
- Patient: the random effect, as each patient has multiple measurements of the outcome
- Time: a fixed effect
- Targeted: a fixed effect, 0 = no, 1= yes, whether or nor the patient was targeted for an intervention to address the outcome
- Sex: a fixed effect, 0=male, 1 = female
I want to know 2 things:
- Is there an association between whether the patient was targeted and the outcome trends over time?
- Is there an association between patient sex and outcome trends over time, among the targeted group only?
Maybe important: I'm not actually trying to make any predictions. Just accurately explain the data that I already have.
To answer the first question, I tried:
md = smf.mixedlm('outcome ~ time * targeted', df, groups = df['patient'])
Is this notation correct? Or should I use:
md = smf.mixedlm('outcome ~ time : targeted', df, groups = df['patient'])
to better compare the difference in outcome trends? Or something else?
To answer the second question, I tried:
md = smf.mixedlm('outcome ~ time * targeted * sex', df, groups = df['patient'])
But I don't think this is correct because the coefficients don't make sense. Patients who are targeted need to have a starting outcome of >6, but the coefficient for targeted:sex is < 6. One solution is to make a separate dataframe that includes only the targeted patients, but I'm curious if there are operators I can use differently here to get what I want.
Thank you!