0

I have encountered unexpected behaviour (at least, unexpected by me) when working with the brilliant mi package, for missing data imputation, and tibbles.

Let's assume a tibble called B. The offending command is :-

A <- missing_data.frame(B)

The resulting error message is :-

Error in .guess_type(y, favor_ordered, favor_positive, threshold, variable_name) [[Name of first variable in B]] must be a vector

This example reproduces the behaviour.

# Make the test data frames  and tibbles
Numbers <- sample(seq(1:200),40)
Numbers2 <- sample(seq(1:200),40)
Numbers3 <- sample(seq(1:200),40)
Letters <- sample(letters,40,replace=TRUE)

#Mixed numeric and character data
DF.test <- data.frame(Numbers,Letters)
    str(DF.test) #Number, Factor

DF.test2 <- data.frame(Numbers,Letters,stringsAsFactors = FALSE)
    str(DF.test2) #Number, Character

Tibble.test <- data_frame(Numbers,Letters)
    str(Tibble.test)  #Number, Character

# Run the tests
DF.mdf <- missing_data.frame(DF.test) # Fine
DF2.mdf <- missing_data.frame(DF.test2) # Fine

Tibble.mdf <- missing_data.frame(Tibble.test) # ERROR
Tibble.mdf <- missing_data.frame(data.frame(Tibble.test)) # Fine

#Purely numeric data
Tibble.test2 <- data_frame(Numbers,Numbers2,Numbers3)
str(Tibble.test2) # Number, Number, Number

# Run the tests
Tibble.mdf2 <- missing_data.frame(Tibble.test2) # ERROR
Tibble.mdf2 <- missing_data.frame(data.frame(Tibble.test2)) # Fine

It seems that mi objects to something in tibbles, but not in dataframes. The error message is unhelpful. It's easy to fix, by coercing the tibble back to a data frame, but I don't see a mention of this issue in the documentation. I am wholly unfamiliar with the innards of mi.

Am I missing something basic, or something in the documentation, or is this genuinely unexpected behaviour? All assistance, comments and interpretations are welcomed.

Cœur
  • 37,241
  • 25
  • 195
  • 267
astaines
  • 872
  • 2
  • 9
  • 20
  • tibbles are not always compatible with data.frames. hadley decided on a hard break from the data.frame syntax for various reasons in part due to certain consistency of outputs. There are a number of questions on SO that are due to this incompatibility. Some packages rely on having a data.frame as input. This is likely one of those cases. In particular, `dat[, 1]` returns a vector if dat is a data.frame but returns a tibble if dat is a tibble. This seems to be the cause of this error as ` .guess_type` seems to want a vector. – lmo Nov 06 '17 at 16:25
  • https://github.com/cran/mi/blob/master/R/missing_data.frame.R Looks like it explicitly casts matrices and lists using `as.data.frame()`, but it doesn't have any method for tibbles. – David Klotz Nov 06 '17 at 16:28
  • Thanks you both very much - an excellent explanation, and it draws my attention to some key difference between tibbles and datfarmes. – astaines Nov 07 '17 at 08:39
  • Rather than mark the question as solved in the question title, post the solution as an answer! Answering your own question is not forbidden (there is even an option to answer the question directly at the [Ask a Question](https://stackoverflow.com/questions/ask) page) – Filnor Nov 07 '17 at 08:48
  • Good suggestion - thank you! – astaines Nov 08 '17 at 20:46

1 Answers1

0

The answer is this :-

Tibbles and dataframes behave almost identically, but not quite. One part of the behavious of a datframe that has changed is that the column subset operator.

D[, 1] returns a vector if D is a data.frame but returns a tibble if D is a tibble. the mi package wants a vector, and so complains.

Thanks to Imo and David Klotz for this answer.

astaines
  • 872
  • 2
  • 9
  • 20