1

I have yearly data over time (longitudinal data) with repeated measures for many of the subjects. I think I need multilevel modeling/regressions to deal with sure-to-be correlated clusters of measurements for the same individuals over time. The data currently is in separate tables for each year.

I was wondering if there was a way that was built into scikit-learn, like LinearRegression(), that would be able to conduct a multilevel regression where Level 1 is all the data over the years, and Level 2 is for the clustered on the subjects (clusters for each subject's measurements over time). And if so, if it's better to have the longitudnal data laid out length-wise (where the each subject's measures over time are all in one row) or stacked (where each measure for each year is it's own row).

Is there a way to do this?

exlo
  • 315
  • 1
  • 8
  • 20

1 Answers1

2

Estimation of random effects in multilevel models is non-trivial and you typically have to resort to Bayesian inference methods.

I would suggest you look into Bayesian inference packages such as pymc3 or BRMS (if you know R) where you can specify such a model. Or alternatively, look at lme4 package in R for a fully-frequentist implementation of multi-level models.

Also, I think you would be able to get some inspiration from the "sleep-deprivation" dataset which is used as a textbook example of longitudinal data-analysis (https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf) pg.4

To get started in pymc3 have a look here:

https://github.com/fonnesbeck/Bios8366/blob/master/notebooks/Section4_7-Multilevel-Modeling.ipynb

Mehtab Pathan
  • 443
  • 4
  • 15