I am currently trying to impute missing entries in a three-level dataset using the mi
-package.
Currently, I am facing two issues with how to impute multilevel data correctly:
- Firstly, the variables at level two and three are imputed with different values within the same cluster/ID.
- Secondly, I am also unable to set the boundaries for a continuous variable that cannot have values outside a specific interval.
Problem 1. How to create a multilevel_missing_data.frame
The documentation states that:
Objects from the Class
Objects can be created by calls of the form
new("multilevel_missing_data.frame", ...)
. However, its users almost always will pass adata.frame
to themissing_data.frame
function and specify thesubclass
andgroups
arguments.Slots
The
multilevel_missing_data.frame
class inherits from themissing_data.frame
-class and has two additional slots
- groups Object of class
character
indicating which variables define the multilevel structure- mdf_list Object of class
mdf_list
whose elements contain amissing_data.frame
for each group. This slot is filled automatically by theinitialize
method.
If I understand the documentation correctly, it should be possible to create a multilevel_missing_data.frame
by specifying subclass
and groups
:
library(dlpyr)
library(magrittr)
library(mi)
# loading and preparing data
url <- "https://simongrund1.github.io/posts/multiple-imputation-for-three-level-and-cross-classified-data_files/example_3l.Rdata"
download.file(url, basename(url))
load("example_3l.Rdata")
dat %<>%
mutate(
z = as.ordered(round(z)+2),
x = abs(x)
)
# Create multilevel_missing_data.frame
mdf <- missing_data.frame(
dat,
subclass = "multilevel",
groups = c("class", "school"
)
# Output:
> Error in getClass(Class, where = topenv(parent.frame())): “NA” is not a defined class
Problem 2. Defining the boundaries in the bounded-continuous
class
According to the documentation
Objects can be created that are of
bounded-continuous
class via the themissing_variable
generic function by specifyingtype = "bounded-continuous"
as well aslower
and / orupper
This means that I should be able to define a variable as bounded-continuous
the following way:
missing_variable(
dat$x,
type = "bounded-continuous",
lower = 0, upper = 5
)
However, I can not figure out how to add the defined variable to the missing_data.frame
object together with all the other variables.
"working" example
(i.e. the code runs, but it doesn't impute the values correctly as the grouping variables are not defined, and boundaries for x
are not set.):
library(dlpyr)
library(magrittr)
library(mi)
# loading and preparing data
url <- "https://simongrund1.github.io/posts/multiple-imputation-for-three-level-and-cross-classified-data_files/example_3l.Rdata"
download.file(url, basename(url))
load("example_3l.Rdata")
dat %<>%
mutate(
z = as.ordered(round(z)+2),
x = abs(x)
)
mdf <- missing_data.frame(dat) # Here I should define grouping variables somehow
mdf <- change(
mdf,
y = c("class", "school", "x"),
what = "type",
to = c("group", "group", "bounded-continuous")
) # Boundaries for "x" should be added here
mi_mdf <- mi(mdf, n.iter = 30, n.chains = 4, max.minutes = 20)
mi_mdf <- complete(mi_mdf, m = 1)
mi_mdf %>% select(class:w)
Output (incorrect):
- Variable "z" has varying values imputed for the same level 2 ("class") ID
- Variable "w" has varying values imputed for the same level 3 ("school") ID
class school x y z w
<dbl> <dbl> <dbl> <dbl> <ord> <dbl>
2 1 0.32023493 -1.024494036 3 1.329629
2 1 0.06949615 0.547773458 3 1.329629
2 1 1.98737694 0.287954055 1 1.329629
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
250 50 2.54218522 1.63412995 1 -1.52528136 #
250 50 2.11927441 0.60549683 1 -1.17877900
250 50 2.01830248 1.27016541 1 -1.74640219
How can I define class
as level 2 and school
as level 3 variables, and how can I set the boundaries of x
to 0 (lower) and 10 (upper)?
Desired output:
- Variable "z" has the same imputed value for each level 2 ("class") ID
- Variable "w" has the same imputed values for each level 3 ("school") ID
- "x" seemingly do not have values outside the desired range. (My assumption is that when defining a variable as
bounded-continuous
, boundaries are automatically set based on the range of the real values.) However, the boundaries should nonetheless be defined as the interval which the values can exist within.
class school x y z w
<dbl> <dbl> <dbl> <dbl> <ord> <dbl>
2 1 0.32023493 -1.024494036 3 1.329629
2 1 0.06949615 0.547773458 3 1.329629
2 1 1.98737694 0.287954055 3 1.329629
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
250 50 2.54218522 1.63412995 1 -1.52528136
250 50 2.11927441 0.60549683 1 -1.52528136
250 50 2.01830248 1.27016541 1 -1.52528136