2

When reading in csv files using the readr package date objects are stored as integer values. When I say stored as integer I don't mean the class of the date column, I mean the underlying date value R stores. This prevents the ability to use the dplyr join functions if one data frame's dates are stored as numeric values and the other's are integer. I've included a reproducible example below. Is there anything I can do to prevent this behavior?

library(readr)

df1 <- data.frame(Date = as.Date(c("2012-11-02", "2012-11-04", "2012-11-07", "2012-11-09", "2012-11-11")), Text = c("Why", "Does", "This", "Happen", "?"), stringsAsFactors = F)
class(df1$Date)
# [1] "Date"
dput(df1$Date[1])
# structure(15646, class = "Date")

# Write to dummy csv
write.csv(df1, file = "dummy_csv.csv", row.names = F)

# Read back in data using both read.csv and read_csv
df2 <- read.csv("dummy_csv.csv", as.is = T, colClasses = c("Date", "character"))
df3 <- read_csv("dummy_csv.csv")

# Examine structure of date values
class(df2$Date)
# [1] "Date"
class(df3$Date)
# [1] "Date"

dput(df2$Date[1])
# structure(15646, class = "Date")
dput(df3$Date[1])
# structure(15646L, class = "Date")

# Try to join using dplyr joins
both <- full_join(df2, df3, by = c("Date"))
Error: cannot join on columns 'Date' x 'Date': Cant join on 'Date' x 'Date' because of incompatible types (Date / Date) 

# Base merge works
both2 <- merge(df2, df3, by = "Date")

# converting a POSIXlt object to Date is also stored as numeric
temp_date <- as.Date(as.POSIXct("11OCT2012:19:00:00", format = "%d%b%Y:%H:%M:%S"))
dput(temp_date)
# structure(15624, class = "Date")

Judging by this issue on the dplyr repo it seems like Hadley thinks this is a feature but any time your date values are stored differently you can't merge on them, and I haven't figured out a way to convert the integer date object to a numeric one. Is there anyway to stop the readr package from doing this or any way to convert a Date object stored as an integer to a numeric value?

Matt Mills
  • 588
  • 1
  • 6
  • 14

1 Answers1

4

According to the big man himself This is a bug with dplyr not readr. He says the storing of numeric vs integer values when reading in files is ok but dplyr should be able to handle the difference like merge does.

Matt Mills
  • 588
  • 1
  • 6
  • 14