0

I am currently trying to impute missing entries in a three-level dataset using the mi-package.

Currently, I am facing two issues with how to impute multilevel data correctly:

  • Firstly, the variables at level two and three are imputed with different values within the same cluster/ID.
  • Secondly, I am also unable to set the boundaries for a continuous variable that cannot have values outside a specific interval.
Problem 1. How to create a multilevel_missing_data.frame

The documentation states that:

Objects from the Class

Objects can be created by calls of the form new("multilevel_missing_data.frame", ...). However, its users almost always will pass a data.frame to the missing_data.frame function and specify the subclass and groups arguments.

Slots

The multilevel_missing_data.frame class inherits from the missing_data.frame-class and has two additional slots

  • groups Object of class character indicating which variables define the multilevel structure
  • mdf_list Object of class mdf_list whose elements contain a missing_data.frame for each group. This slot is filled automatically by the initialize method.

If I understand the documentation correctly, it should be possible to create a multilevel_missing_data.frame by specifying subclass and groups:

library(dlpyr)
library(magrittr)
library(mi)

# loading and preparing data
url <- "https://simongrund1.github.io/posts/multiple-imputation-for-three-level-and-cross-classified-data_files/example_3l.Rdata"
download.file(url, basename(url))
load("example_3l.Rdata")
dat %<>%
mutate(
    z = as.ordered(round(z)+2),
    x = abs(x)
)

# Create multilevel_missing_data.frame
mdf <- missing_data.frame(
   dat,
   subclass = "multilevel",
   groups = c("class", "school"
)
# Output:
> Error in getClass(Class, where = topenv(parent.frame())): “NA” is not a defined class
Problem 2. Defining the boundaries in the bounded-continuous class

According to the documentation

Objects can be created that are of bounded-continuous class via the the missing_variable generic function by specifying type = "bounded-continuous" as well as lower and / or upper

This means that I should be able to define a variable as bounded-continuous the following way:

missing_variable(
   dat$x,
   type = "bounded-continuous",
   lower = 0, upper = 5
)

However, I can not figure out how to add the defined variable to the missing_data.frame object together with all the other variables.

"working" example

(i.e. the code runs, but it doesn't impute the values correctly as the grouping variables are not defined, and boundaries for x are not set.):

library(dlpyr)
library(magrittr)
library(mi)

# loading and preparing data
url <- "https://simongrund1.github.io/posts/multiple-imputation-for-three-level-and-cross-classified-data_files/example_3l.Rdata"
download.file(url, basename(url))
load("example_3l.Rdata")
dat %<>%
mutate(
    z = as.ordered(round(z)+2),
    x = abs(x)
)

mdf <- missing_data.frame(dat) # Here I should define grouping variables somehow
mdf <- change(
    mdf,
    y = c("class", "school", "x"),
    what = "type",
    to = c("group", "group", "bounded-continuous")
) # Boundaries for "x" should be added here

mi_mdf <- mi(mdf, n.iter = 30, n.chains = 4, max.minutes = 20)
mi_mdf <- complete(mi_mdf, m = 1)

mi_mdf %>% select(class:w)
Output (incorrect):
  • Variable "z" has varying values imputed for the same level 2 ("class") ID
  • Variable "w" has varying values imputed for the same level 3 ("school") ID
class school x          y            z     w
<dbl> <dbl>  <dbl>      <dbl>        <ord> <dbl>
2      1     0.32023493 -1.024494036     3  1.329629 
2      1     0.06949615  0.547773458     3  1.329629
2      1     1.98737694  0.287954055     1  1.329629
⋮      ⋮              ⋮           ⋮    ⋮         ⋮
250   50     2.54218522   1.63412995     1 -1.52528136 # 
250   50     2.11927441   0.60549683     1 -1.17877900
250   50     2.01830248   1.27016541     1 -1.74640219

How can I define class as level 2 and school as level 3 variables, and how can I set the boundaries of x to 0 (lower) and 10 (upper)?

Desired output:
  • Variable "z" has the same imputed value for each level 2 ("class") ID
  • Variable "w" has the same imputed values for each level 3 ("school") ID
  • "x" seemingly do not have values outside the desired range. (My assumption is that when defining a variable as bounded-continuous, boundaries are automatically set based on the range of the real values.) However, the boundaries should nonetheless be defined as the interval which the values can exist within.
class school x          y            z     w
<dbl> <dbl>  <dbl>      <dbl>        <ord> <dbl>
2      1     0.32023493 -1.024494036     3  1.329629
2      1     0.06949615  0.547773458     3  1.329629
2      1     1.98737694  0.287954055     3  1.329629
⋮      ⋮              ⋮           ⋮    ⋮         ⋮
250   50     2.54218522   1.63412995     1 -1.52528136
250   50     2.11927441   0.60549683     1 -1.52528136
250   50     2.01830248   1.27016541     1 -1.52528136
Pål Bjartan
  • 793
  • 1
  • 6
  • 18
  • In the docs, it says `Objects can be created by calls of the form new("multilevel_missing_data.frame", ...).` and then u call the `missing_data.frame` – akrun Jul 20 '22 at 15:56
  • I am uncertain how to use `new()` for this purpose. Following your quote the documentation also states "However, its users almost always will pass a `data.frame` to the `missing_data.frame` function and specify the `subclass` and `groups` arguments." See my updated post for elaboration. – Pål Bjartan Jul 20 '22 at 23:56

0 Answers0