1

my question is a follow-up to this question on imputation by group using "mice": multiple imputation and multigroup SEM in R

The code in the answer works fine as far as the imputation part goes. But afterwards I am left with a list of actually complete data but more than one set. The sample looks as follows:

'Set up data frame'
df.g1<-data.frame(ID=rep("A",5),x1=floor(runif(5,0,2)),x2=floor(runif(5,10,20)),x3=floor(runif(5,100,150)))
df.g2<-data.frame(ID=rep("B",5),x1=floor(runif(5,0,2)),x2=floor(runif(5,25,50)),x3=floor(runif(5,200,250)))
df.g3<-data.frame(ID=rep("C",5),x1=floor(runif(5,4,5)),x2=floor(runif(5,75,99)),x3=floor(runif(5,500,550)))
df<-rbind(df.g1,df.g2,df.g3)

'Introduce NAs'

df$x1[rbinom(15,1,0.1)==1]<-NA
df$x2[rbinom(15,1,0.1)==1]<-NA
df$x3[rbinom(15,1,0.1)==1]<-NA
df

'Impute values by group:'

df.clean<-lapply(split(df,df$ID), function(x) mice::complete(mice(df,m=5)))
df.clean

As you can see, df.clean is a list of 3. One element per group. But each element containing a complete data set I am looking for.

The original answer suggests to rbind() the obtained data in df.clean which leaves me with a new data set with 45 (3x the original size) observations. Here is the original code for the last step:

imputed.both <- do.call(args = df.clean, what = rbind)

Which data is the "right" one? And why the last step?

Thanks a bunch!

zx8754
  • 52,746
  • 12
  • 114
  • 209
Juan
  • 171
  • 1
  • 12
  • `df.clean` is a list of dataframes whereas `imputed.both` is the same data as one dataframe. What is your question exactly ? – Ronak Shah Nov 11 '19 at 13:02
  • df.clean has basically three times the answer im looking for in it. With the imputed data. So which of those three data.frames is the "right" one? Secondly, whats the point of the combination of these three data frames into one? In my sample it is still relatively easy to check the output, but my original data set as about 500 groups. I.e. I'd like not to increase my data size if not necessary. – Juan Nov 11 '19 at 13:05
  • Hi Juan, i think you misunderstood https://stackoverflow.com/questions/48770037/multiple-imputation-and-multigroup-sem-in-r. The OP needed to impute within each subset, hence the split – StupidWolf Nov 11 '19 at 13:36
  • In your case, you don't need to do that, and you can see that in your function, lapply(split(df,df$ID), function(x) mice::complete(mice(df,m=5))), x is practically useless. – StupidWolf Nov 11 '19 at 13:36
  • Just use, mice::complete(mice(df,m=5)) and it should be ok for what you need – StupidWolf Nov 11 '19 at 13:37
  • @StupidWolf I chose the original question because of the imputation by group. As you can see in my sample data x2 for group "A" can only take on values between 10 and 20, whereas x2 for group"C" has a range between 75 and 99. I find it counter intuitive to do the imputation over the complete sample as any value for A;x2 far outside the 10-20 is not helping. Does this help to understand my question ? ;) – Juan Nov 11 '19 at 14:00
  • yes. I get it now! you need to do this : df.clean<-lapply(split(df,df$ID), function(x) mice::complete(mice(x,m=5))) – StupidWolf Nov 11 '19 at 14:04
  • because in the lapply, you are passing a subset of df( when df has ID A for example), then you need to do a complete on this subset, which is x. You can see split(df,df$ID), it's a list of 3 elements, hence, the mice function loops through each of this element – StupidWolf Nov 11 '19 at 14:05
  • But isnt that exactly what i described in the question above? Resulting in my original question? – Juan Nov 11 '19 at 14:09
  • Hi Juan, can you run these two lines: df.clean<-lapply(split(df,df$ID), function(x) mice::complete(mice(x,m=5))) ; dim(do.call(rbind,df.clean)) – StupidWolf Nov 11 '19 at 14:18
  • you get 15 rows, not 45 in your question. – StupidWolf Nov 11 '19 at 14:19
  • When I do imputed.both <- do.call(args = df.clean, what = rbind) i get 45. When i run your code, the second line returns ``` > df.clean [1] 45 5 ``` – Juan Nov 11 '19 at 14:26
  • 1
    Ok i post it as an answer, i hope it's clearer now – StupidWolf Nov 11 '19 at 14:42

1 Answers1

2

There's a bug in the code, i have a edited version below that works:

#Set up data frame
set.seed(12345)
df.g1<-data.frame(ID=rep("A",5),x1=floor(runif(5,0,2)),x2=floor(runif(5,10,20)),x3=floor(runif(5,100,150)))
df.g2<-data.frame(ID=rep("B",5),x1=floor(runif(5,0,2)),x2=floor(runif(5,25,50)),x3=floor(runif(5,200,250)))
df.g3<-data.frame(ID=rep("C",5),x1=floor(runif(5,4,5)),x2=floor(runif(5,75,99)),x3=floor(runif(5,500,550)))
df<-rbind(df.g1,df.g2,df.g3)

#Introduce NAs

df$x1[rbinom(15,1,0.1)==1]<-NA
df$x2[rbinom(15,1,0.1)==1]<-NA
df$x3[rbinom(15,1,0.1)==1]<-NA
# check NAs
colSums(is.na(df))

#Impute values by group:

# here's the bug
df.clean<-lapply(split(df,df$ID), function(x) mice::complete(mice(x,m=5)))
imputed.both <- do.call(args = df.clean, what = rbind)
dim(imputed.both)
# returns 15,4

In the code in the question, you have

df.clean<-lapply(split(df,df$ID), function(x) mice::complete(mice(df,m=5)))
dim(do.call(rbind,df.clean))
#this returns 45,4

The function is specified with "x" but you call df from the global environment. Hence you impute on the complete df.

So to answer your question, if you do this step:

split(df,df$ID)

You split your data frame into a list of data.frames with only A,B or Cs. Then if you lapply through this list, you get

df.clean<-lapply(split(df,df$ID), function(x) mice::complete(mice(x,m=5)))
names(df.clean)
lapply(df.clean,dim)

each item of the list df.clean contains a subset of the original df, with ID being A, B or C. Now you combine this list together into a data.frame using:

imputed.both <- do.call(rbind,df.clean)
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thanks again for pointing that typo out. When running the code on my actual data, it bugs out after a while. I think because there is to little observations per group.Would you know if this is the case, or if there might be a different reason to it? The error message is as follows: Error in edit.setup(data, setup, ...) : nothing left to impute In addition: There were 16 warnings (use warnings() to see them) Called from: edit.setup(data, setup, ...) Any idea? – Juan Nov 16 '19 at 11:34
  • Hey @Juan, I came across that error message once or twice.. I honestly cannot remember whether it matters or not. I was checking some of my recent use, I think transforming the variables is needed in some case.. Hopefully this helps you – StupidWolf Nov 17 '19 at 03:38