-1

When I try to use predict(), or predictInterval(), or bootMer() by giving it newdata (even if it's the same data that I built the model with), I get the following error:

Error in [.data.frame(fr, vars) : undefined columns selected

I've been using the tidyverse package so I thought it might have to do with the tibble, but converting to data.frame (using as.data.frame) does not solve the problem.

I have also tried changing column names so that they don't contain spaces and getting rid of NAs in the data.

Here's an example:

library(tidyverse)
library(lme4)
library(lattice)
library(merTools)

df = read_csv(file = "~/Documents/Experiments/Myles Study/mylesdata.csv")

names(df)[names(df) == "h3"] = "hr3"

spms = df %>%
  dplyr::select(
    -c(
      mets1, mets2, mets3, mets4, mets5, mets6
      , rpe1, rpe2, rpe3, rpe4, rpe5, rpe6
      , hr1, hr2, hr3, hr4, hr5, hr6
      , sl1, sl2, sl3, sl4, sl5, sl6
    )
  ) %>%
  gather(speed_lev, spm, spm1, spm2, spm3, spm4, spm5, spm6) 
spms$speed_lev = factor(spms$speed_lev)
levels(spms$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")


mets = df %>% 
  dplyr::select(
    -c(
      spm1, spm2, spm3, spm4, spm5, spm6
      , rpe1, rpe2, rpe3, rpe4, rpe5, rpe6
      , hr1, hr2, hr3, hr4, hr5, hr6
      , sl1, sl2, sl3, sl4, sl5, sl6
    )
  ) %>%
  gather(speed_lev, mets, mets1, mets2, mets3, mets4, mets5, mets6)
mets$speed_lev = factor(mets$speed_lev)
levels(mets$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")


rpe = df %>% 
  dplyr::select(
    -c(
      spm1, spm2, spm3, spm4, spm5, spm6
      , mets1, mets2, mets3, mets4, mets5, mets6
      , hr1, hr2, hr3, hr4, hr5, hr6
      , sl1, sl2, sl3, sl4, sl5, sl6
    )
  ) %>%
  gather(speed_lev, rpe, rpe1, rpe2, rpe3, rpe4, rpe5, rpe6)
rpe$speed_lev = factor(rpe$speed_lev)
levels(rpe$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")


hr = df %>% 
  dplyr::select(
    -c(
      spm1, spm2, spm3, spm4, spm5, spm6
      , mets1, mets2, mets3, mets4, mets5, mets6
      , rpe1, rpe2, rpe3, rpe4, rpe5, rpe6
      , sl1, sl2, sl3, sl4, sl5, sl6
    )
  ) %>%
  gather(speed_lev, hr, hr1, hr2, hr3, hr4, hr5, hr6)
hr$speed_lev = factor(hr$speed_lev)
levels(hr$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")


sl = df %>% 
  dplyr::select(
    -c(
      spm1, spm2, spm3, spm4, spm5, spm6
      , mets1, mets2, mets3, mets4, mets5, mets6
      , rpe1, rpe2, rpe3, rpe4, rpe5, rpe6
      , hr1, hr2, hr3, hr4, hr5, hr6
    )
  ) %>%
  gather(speed_lev, sl, sl1, sl2, sl3, sl4, sl5, sl6)
sl$speed_lev = factor(sl$speed_lev)
levels(sl$speed_lev) = c("1.5", "2", "2.5", "3.5", "4.0", "4.5")


dat = left_join(spms, mets)
dat = left_join(dat, rpe)
dat = left_join(dat, hr)
dat = left_join(dat, sl)
names(dat)[names(dat) == "speed_lev"] = "speed (m/h)"
dat$`speed (m/h)` = as.numeric(as.character(dat$`speed (m/h)`))

dat$mets_sq = dat$mets^2
dat$mets_cubed = dat$mets^3

m3 = lmer(spm ~  `leg length (cm)` + mets_cubed + mets_sq + mets + (1 | subject), data = dat)
summary(m3)

The summary is:

Linear mixed model fit by REML ['lmerMod']
Formula: spm ~ `leg length (cm)` + mets_cubed + mets_sq + mets + (1 |      subject)
   Data: dat

REML criterion at convergence: 1562.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.8094 -0.5213  0.1120  0.5638  3.0933 

Random effects:
 Groups   Name        Variance Std.Dev.
 subject  (Intercept) 49.41    7.029   
 Residual             26.90    5.186   
Number of obs: 238, groups:  subject, 43

Fixed effects:
                  Estimate Std. Error t value
(Intercept)       41.30017   18.13968   2.277
`leg length (cm)` -0.58471    0.16535  -3.536
mets_cubed         0.44777    0.09343   4.793
mets_sq           -8.67702    1.37655  -6.303
mets              60.66036    6.36756   9.526

Correlation of Fixed Effects:
            (Intr) `l(c)` mts_cb mts_sq
`lglng(cm)` -0.863                     
mets_cubed  -0.434 -0.045              
mets_sq      0.449  0.045 -0.994       
mets        -0.459 -0.045  0.974 -0.993

Then I try to make predictions:

predict(m3, dat)

predictInterval(
  m3
  , dat
  , level = 0.95
  , n.sims = 1000
  , stat = "median"
  , type="linear.prediction"
  , include.resid.var = TRUE
  )

And get the errors:

Error in `[.data.frame`(fr, vars) : undefined columns selected

Error in `[.data.frame`(fr, vars) : undefined columns selected
In addition: Warning message:
In predictInterval(m3, dat, level = 0.95, n.sims = 1000, stat = "median",  :
  newdata is tbl_df or tbl object from dplyr package and has been
              coerced to a data.frame

Session info:

sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] merTools_0.3.0  arm_1.9-3       MASS_7.3-45     lattice_0.20-34 lme4_1.1-12     Matrix_1.2-7.1  dplyr_0.5.0    
 [8] purrr_0.2.2     readr_1.0.0     tidyr_0.6.0     tibble_1.2      ggplot2_2.1.0   tidyverse_1.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.6      nloptr_1.0.4     plyr_1.8.4       base64enc_0.1-3  tools_3.3.2      digest_0.6.10   
 [7] jsonlite_1.1     evaluate_0.10    blme_1.0-4       nlme_3.1-128     gtable_0.2.0     psych_1.6.9     
[13] shiny_1.0.0      DBI_0.5          yaml_2.1.14      parallel_3.3.2   mvtnorm_1.0-5    coda_0.18-1     
[19] stringr_1.1.0    knitr_1.15       htmlwidgets_0.8  grid_3.3.2       DT_0.2           R6_2.1.3        
[25] rmarkdown_1.1    foreign_0.8-67   minqa_1.2.4      reshape2_1.4.1   magrittr_1.5     scales_0.4.0    
[31] htmltools_0.3.5  splines_3.3.2    assertthat_0.1   abind_1.4-5      mnormt_1.5-5     xtable_1.8-2    
[37] mime_0.5         colorspace_1.2-6 httpuv_1.3.3     labeling_0.3     stringi_1.1.1    lazyeval_0.2.0  
[43] munsell_0.4.3    broom_0.4.1  
  • 1
    interesting question, but can we have a reproducible example and the results of `sessionInfo()` (or just `packageVersion("lme4")`) please? – Ben Bolker Mar 07 '17 at 14:49
  • First you should try with a minimal setup, i.e., an R session without this huge amount of packages loaded. Then you can add one package at a time and test which one causes the problem. I would not be surprised if the Hadley's package ecosystem with its bad practices of masking functions is the culprit. – Roland Mar 08 '17 at 07:24

1 Answers1

1

Roland's suspicion was correct. The tidyverse package (ecosystem) was masking the predict() functions of interest.

  • 1
    can you be a little more specific and tell us the results of `find("predict")` , i.e. exactly which package is it? – Ben Bolker Mar 11 '17 at 13:38