0

I imputed a data using the mice package in R. The dataset contains a factor variable with n levels of factor. I would like to calculate the proportion of each factor and return the standard errors accounting the imputed values. Below is a sample code.

data(nhanes)

nhanes$hyp <- as.factor(nhanes$hyp)

imp <- mice(nhanes,method=c("polyreg","pmm","logreg","pmm"), seed = 23109)

m <- imp$m

Q <- rep(NA, m)

U <- rep(NA, m)

for (i in 1:m) {
  Q[i] <- mean(complete(imp, i)$hyp)
  U[i] <- var(complete(imp, i)$hyp) / nrow(nhanes) # (standard error of   estimate)^2
}

pool.scalar(Q, U, method = "rubin") # Rubin 1987

This display the results below:

> pool.scalar(Q, U, method = "rubin") # Rubin 1987
$m
[1] 5

$qhat
[1] NA NA NA NA NA

$u
[1] 0.006666667 0.009066667 0.008400000 0.009066667 0.006666667

$qbar
[1] NA

$ubar
[1] 0.007973333

$b
[1] NA

$t
[1] NA

$r
[1] NA

$df
[1] NA

$fmi
[1] NA

$lambda
[1] NA

How can I modify my code such that it would give me values for all components return by pool.scalar?

Thanks!

Jaap
  • 81,064
  • 34
  • 182
  • 193
dixi
  • 680
  • 1
  • 13
  • 27
  • Two things: First, please clarify what you mean by "proportion of each factor with standard errors." Do you want the mean of a dichotomous variable, or is that supposed to extend to polytomous factors? Second, your code produces several errors and warnings. Most importantly, `mean` and `var` are not useful for factors. For the mean of a dichotomous (!) factor, try `as.numeric(complete(imp, i)$hyp)-1` in the definition of `Q` and `U` in your loop. – SimonG Feb 24 '17 at 16:04
  • yes, i agree. mean and var is for continuous type of variable. I got the code from the manual and I do not know how to change it. by proportion I meant the percentage of each factor. fro example, variable 1 is for gender. male and female. I want to calculate the proportion of male and female as well as their standard errors. – dixi Feb 24 '17 at 16:14
  • If it's for dichotomous factors like sex (male, female), then use `as.numeric(complete(imp, i)$hyp)-1` instead of just `complete(imp, i)$hyp`. This converts the factor to a numeric data type before aggregation, thus avoiding errors and `NA`s. – SimonG Feb 24 '17 at 16:50
  • how about for factors>2 ? – dixi Feb 24 '17 at 16:54
  • 1
    To get the proportion for each level of a factor with >2 levels, you can refer to each level individually. For example, to get the proportion of cases with `hyp==1`, you can calculate the `Q` and `U` on the basis of `complete(imp, i)$hyp==1` (which is dichotomous again). – SimonG Feb 24 '17 at 17:35
  • i see.. so i have to do it per level. I think this will suffice. Thank you Simon! – dixi Feb 24 '17 at 18:16

0 Answers0