Hi I tried to impute missing values using the available values in a corresponding group. Please see the following data for an example.
dput(question)
structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L
), .Label = c("A", "B"), class = "factor"), Year = c(2004L, 2005L,
2006L, 2007L, 2006L, 2007L, 2008L), Score = c(NA, 100L, NA, 95L,
NA, NA, 88L)), .Names = c("Group", "Year", "Score"), class = "data.frame", row.names = c(NA,
-7L))
For the first NA score for group A in year 2004, I would like to use the available obs from the closest year in the same group (that is 100 for Group A year 2005); For NA in group A year 2006, I would like to use the average of score from 2005 and 2007 in group A; For NS in group B year 2006 and 2007, I would like to use the number in 2008 for group B.
Is it any r package for imputation that is applicable to my cases? or do you have any suggestion on such imputation?
Really appreciate
Updated I amended PsyNeuroSci’s macro so that the distance will be calculated using Year. Sorry I did not know how to put the amended codes after PsyNeuroSci’s.
impute_nearest = function(dat, var0, var){
for(i in 1:length(dat[,var])){
if(is.na(dat[,var][i])){
na.pos <<- dat[, var0][i]
non.na.pos <<- dat[, var0][which(!is.na(dat[,var]))]
distance <<- min(abs(na.pos-non.na.pos))
dat[,var][i] = mean(c(dat[which(dat[, var0]==(na.pos+distance)),var], dat[which(dat[, var0]==(na.pos-distance)),var]),
na.rm=T)
}
}
return(dat)
}