For a prediction problem I am working on I want to calculate the variance in the data explained by the linear effects of my linear mixed model. To evaluate my predictive performance I plan on using five-fold cross-validation.
The common approach to this problem I see on the internet is to a marginal R. This can be done with the r.squaredGLMM function from the MuMln package (https://cran.r-project.org/web/packages/MuMIn/MuMIn.pdf) based on work by Nakagawa et al. (2017). As I understand it, his formula divides the variances explained by the fixed effect by the total model variance:
However, I don't know how to make this formula work in a cross-validation setting as I don't know how to calculate the specific variances for fixed, random, and error in the test dataset. My current solution is to apply the OLS formula (1 - var(fitted)/var(observed)), but this does not match the maximum likelihood approach used in LMMs. It also leads to a negative outcome in some situations.
My question in two stepped:
- Is there a way to calculate the required variances for the Nakagawa formula?
- If not, can the 'classic' R2 be applied in this setting?