0

I am trying to convert multiple date columns (with different format) in data.table. There are few method already available. One of the link Efficiently convert a date column in data.table. I was trying using mapply. But got the following error:

Error in [.data.table(df, , :=((paste0(dtVar, "")), mapply(function(x, : Supplied 12 items to be assigned to 6 items of column 'X1'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

library(data.table)

# sample data
df <- data.table(
  X1 = c("1996-01-04", "1996-01-05", "1996-01-08", "1996-01-09", "1996-01-10", "1996-01-11"), 
  X2 = c("02/01/1996", "03/01/1996", "04/01/1996", "05/01/1996", "08/01/1996", "09/01/1996"), 
  stringsAsFactors = FALSE)


# convert date columns
dtVar <- c("X1", "X2")
inDtFmt <- c("%Y-%m-%d","%d/%m/%Y")

df[,(dtVar) := mapply(function(x,y){strptime(df[[x]], format = y)}, dtVar, inDtFmt)]

## Further investigation
mm <- mapply(function(x,y){strptime(df[[x]], format = y)}, dtVar, inDtFmt)

str(mm)
List of 2
# $ X1: POSIXlt[1:6], format: "1996-01-04" "1996-01-05" "1996-01-08" "1996-01-09" ...
# $ X2: POSIXlt[1:6], format: "1996-01-02" "1996-01-03" "1996-01-04" "1996-01-05" ...

Can someone tell me why I am getting this error?

Waldi
  • 39,242
  • 6
  • 30
  • 78
Raja
  • 157
  • 1
  • 11

4 Answers4

2

mapply usually tries to simplify result into a vector, you should use Map instead. strptime returns object of class POSIXlt, here you need only date so use as.Date.

Moreover, if you use lubridate::parse_date_time you can do this with lapply.

library(data.table)

df[, (dtVar) := lapply(.SD, lubridate::parse_date_time, inDtFmt), .SDcols = dtVar]
df

#           X1         X2
#1: 1996-01-04 1996-01-02
#2: 1996-01-05 1996-01-03
#3: 1996-01-08 1996-01-04
#4: 1996-01-09 1996-01-05
#5: 1996-01-10 1996-01-08
#6: 1996-01-11 1996-01-09
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

You can do using as.IDate:

df <- df[,as.list(Map(function(x,y){as.IDate(.SD[[x]], format = y)}, dtVar, inDtFmt))]
print(df)

           X1         X2
1: 1996-01-04 1996-01-02
2: 1996-01-05 1996-01-03
3: 1996-01-08 1996-01-04
4: 1996-01-09 1996-01-05
5: 1996-01-10 1996-01-08
6: 1996-01-11 1996-01-09
YOLO
  • 20,181
  • 5
  • 20
  • 40
1

We could use anydate from anytime which can automatically pick up the format and change to Date class

library(data.table)
library(anytime)
df[, (dtVar) := lapply(.SD, anydate), .SDcols = dtVar]

str(df)
#Classes ‘data.table’ and 'data.frame': 6 obs. of  2 variables:
# $ X1: Date, format: "1996-01-04" "1996-01-05" "1996-01-08" "1996-01-09" ...
# $ X2: Date, format: "1996-02-01" "1996-03-01" "1996-04-01" "1996-05-01" ...
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Here is one way to do it:


library(data.table)

# sample data
df <- data.table(
  X1 = c("1996-01-04", "1996-01-05", "1996-01-08", "1996-01-09", "1996-01-10", "1996-01-11"), 
  X2 = c("02/01/1996", "03/01/1996", "04/01/1996", "05/01/1996", "08/01/1996", "09/01/1996"), 
  stringsAsFactors = FALSE)

str(df)

dtFmt <- list(X1 = "%Y-%m-%d", X2 = "%d/%m/%Y")

for (col in names(df)) {
  df[[col]] <- as.Date(df[[col]],  dtFmt[[col]]) 
}

str(df)

David J. Bosak
  • 1,386
  • 12
  • 22
  • Welcome to Stack Overflow. Please don't post code-only answers, you should include an explanation of how this code works and answers the question. This makes it more useful to the OP as well as other users with similar questions, so is more likely to be upvoted. – FluffyKitten Jul 26 '20 at 17:03