0

I would like to delete some of the entries in my dataframe and impute them by using the remaining information by means of aregImpute function. However, when I randomly delete 25% of the data in some of the columns, some columns are left with only one single value (i.e. they are each equal to a constant number n). Then I get the following error:

Error in aregImpute(fmla, data = df_pmm_imp, n.impute = 5, nk = 0) : 
  X01154H.exp is constant

Here is a reproducible example:

df = data.frame(replicate(10,sample(0:100,1000,rep=TRUE)))
df[,10]= 0
smp_size = floor(0.25 * nrow(df))    
set.seed(123)
missing_ind = sample(seq_len(nrow(df)), size = smp_size)
df_pmm_imp[missing_ind,c(6:10)] = NA
fmla = as.formula(paste(" ~ ", paste(names(df), collapse=" +")))
impute_arg = aregImpute(fmla , data = df, n.impute=5, nk=0)

# Error in aregImpute(fmla, data = df, n.impute = 5, nk = 0) : 
          X10 is constant

Is there a way to fix this problem? I understand that a constant column does not provide much information, so it might be leading to problems. However, i don't think it should prevent the whole imputation. For instance, a better practice that comes to my mind would be to assign that constant value to all of the missing variables in the column.

Thanks in advance.

  • You can delete the variable and then perform the imputation. It will work then ;) You can add it back to the data.frame in the end if you want. But what would you want with a constant variable anyway...? – Steffen Moritz Jan 20 '19 at 02:56
  • Among the many columns with missing values that I want to impute, some have only constant non-missing values. The most straightforward approach is to impute these columns with their unique observed value and proceed with this data for the rest of the imputation. However, I was hoping to find a way to do this within the function, without me having the determine those columns and fill them before feeding the data to aregImpute(). – Elif Cansu Akoğuz Jan 20 '19 at 10:52
  • This is what I am trying to do now but i keep getting errors. I tried these codes: `df = df %>% map_if(is_unique(.), ~ rep(unique(na.omit(.)), length(.))) df = df %>% map_if(is_unique(.), ~ unique(na.omit(.))) df = df %>% mutate_if(is_unique(.), ~ unique(na.omit(.))) ` I keep getting this error: `Error in probe(.x, .p) : length(.p) == length(.x) is not TRUE` – Elif Cansu Akoğuz Jan 20 '19 at 10:53

0 Answers0