When reading in csv files using the readr
package date objects are stored as integer values. When I say stored as integer I don't mean the class of the date column, I mean the underlying date value R stores. This prevents the ability to use the dplyr
join functions if one data frame's dates are stored as numeric values and the other's are integer. I've included a reproducible example below. Is there anything I can do to prevent this behavior?
library(readr)
df1 <- data.frame(Date = as.Date(c("2012-11-02", "2012-11-04", "2012-11-07", "2012-11-09", "2012-11-11")), Text = c("Why", "Does", "This", "Happen", "?"), stringsAsFactors = F)
class(df1$Date)
# [1] "Date"
dput(df1$Date[1])
# structure(15646, class = "Date")
# Write to dummy csv
write.csv(df1, file = "dummy_csv.csv", row.names = F)
# Read back in data using both read.csv and read_csv
df2 <- read.csv("dummy_csv.csv", as.is = T, colClasses = c("Date", "character"))
df3 <- read_csv("dummy_csv.csv")
# Examine structure of date values
class(df2$Date)
# [1] "Date"
class(df3$Date)
# [1] "Date"
dput(df2$Date[1])
# structure(15646, class = "Date")
dput(df3$Date[1])
# structure(15646L, class = "Date")
# Try to join using dplyr joins
both <- full_join(df2, df3, by = c("Date"))
Error: cannot join on columns 'Date' x 'Date': Cant join on 'Date' x 'Date' because of incompatible types (Date / Date)
# Base merge works
both2 <- merge(df2, df3, by = "Date")
# converting a POSIXlt object to Date is also stored as numeric
temp_date <- as.Date(as.POSIXct("11OCT2012:19:00:00", format = "%d%b%Y:%H:%M:%S"))
dput(temp_date)
# structure(15624, class = "Date")
Judging by this issue on the dplyr
repo it seems like Hadley thinks this is a feature but any time your date values are stored differently you can't merge on them, and I haven't figured out a way to convert the integer date object to a numeric one. Is there anyway to stop the readr package from doing this or any way to convert a Date object stored as an integer to a numeric value?