1

I have a data and there is a column called Date, I input the data into R.
Here is my data:

unique(data$Date)
 [1] ""           "2016/12/20" "2016/12/27" "2017/1/7"   "2017/1/27"  "2017/2/1"   "2017/2/2"   "2017/2/5"   "2017/2/6"   "2017/2/7"  
[11] "2017/2/8"   "2017/2/10"  "2017/2/11"  "2017/2/13"  "2017/2/14"  "2017/2/15"  "2017/2/17"  "2017/2/16"  "2017/2/24"  "2017/2/19" 
[21] "2017/2/21"  "2017/2/20"  "2017/2/26"  "2017/2/22"  "2017/3/2"   "2017/2/25"  "2017/2/28"  "2017/3/1"   "2017/3/4"   "2017/3/5"  
[31] "2017/3/6"   "2017/3/10"  "2017/3/8"   "2017/3/9"   "2017/3/11"  "2017/3/12"  "2017/3/13"  "2017/3/15"  "2017/3/29"  "2017/5/13" 
[41] "2015/10/5"  "2016/2/22"  "2015/3/6"   "2015/3/7"   "2015/10/15" "2015/3/9"   "2016/1/30"  "2015/10/29" "2015/10/24" "2015/10/17"
[51] "2016/1/8"   "2015/9/24"  "2016/2/15"  "2015/12/8"  "2015/12/10" "2016/2/6"   "2015/11/29" "2016/1/23"  "2015/10/11" "2016/2/16" 
[61] "2015/9/28"  "2016/1/29"  "2015/11/27" "2015/10/12" "2015/11/1"  "2015/11/16" "2015/10/10" "2015/11/30" "2016/1/2"   "2016/1/21" 
[71] "2016/4/22"  "2015/10/21" "2015/11/12" "2015/12/28" "2015/12/30" "2015/11/6"  "2015/10/8"  "2015/12/6"  "2016/1/24"  "2016/1/17" 
[81] "2016/2/26"  "2016/3/6"   "2016/2/17"  "2016/1/11"  "2015/12/3"  "2016/2/11"  "2015/11/22" "2015/10/2"  "2015/10/3"  "2015/11/4" 
[91] "2016/2/10"  "2015/12/9"  "2015/10/9"  "2015/12/1"  "2016/2/25"  "2016/1/19"  "2016/1/18"  "2015/12/13" "2016/2/14"  "2016/3/10" 

class(data$Date)
[1] "character"

I change this character to date format with as.Date():

data$Date <- as.Date(data$Date)
Error in charToDate(x) : 
character string is not in a standard unambiguous format

I don't know how to figure out. I think the problem is "" in the data. I also has another column called Date2 but this column doesn't contain "" in it.
Any suggestion?

Also, If I want to do as.Date with two columns at the same time and define specified format like `as.Date(x, "%Y/%m/%d"), how can I do?

data[,c("Date", "Date2") := lapply(.SD, as.Date), .SDcols = c("Date", "Date2")]
Peter Chen
  • 1,464
  • 3
  • 21
  • 48
  • 4
    Specify the format of the date. `as.Date(x, "%Y/%m/%d")` – Ronak Shah Jul 21 '17 at 02:00
  • @RonakShah - I think it should automatically recognise that format – SymbolixAU Jul 21 '17 at 02:01
  • @SymbolixAU Ideally, it should because `as.Date("2017/01/01")` returns proper date. Maybe the digit precision matter ? – Ronak Shah Jul 21 '17 at 02:03
  • @RonakShah Thanks. it works. However, I don't know why it needs to add `"%Y/%m/%d"`. – Peter Chen Jul 21 '17 at 02:04
  • It has to do with the first case being `""` - compare `as.Date(c("","2015/10/5"))` vs `as.Date(c("2015/10/5"))`. R can not read minds (yet) unfortunately. – thelatemail Jul 21 '17 at 02:06
  • yep. I got it. It works even for `as.Date("2017/1/1")` without any `format` but it is because you have one empty value in the beginning which makes it important to mention the format. – Ronak Shah Jul 21 '17 at 02:06
  • 1
    Maybe I can change `""` to `NA` at first and then change to `date` format? Is that possible? – Peter Chen Jul 21 '17 at 02:09
  • 1
    So does `as.Date` check the first item to 'guess' the format, and then apply that to each subsequent one? Because `as.Date(c("2015/10/5", ""))` is fine. But if the first one is `""`, then it has issues? – SymbolixAU Jul 21 '17 at 02:09
  • 2
    @SymbolixAU - you can see this checking if you type `as.Date.character` in the console and inspect the code. It seems to do what you are guessing it does. – thelatemail Jul 21 '17 at 02:14
  • 1
    For the updated question, `data[, `:=`(Date = as.Date(Date), Date2 = as.Date(Date2))]` should do (with back-ticks around the `:=`) – SymbolixAU Jul 21 '17 at 02:18
  • @SymbolixAU yes, it works. However, my code cannot work. I don't know why. – Peter Chen Jul 21 '17 at 02:23
  • 1
    Have you specified the `"%Y/%m/%d"` format? – SymbolixAU Jul 21 '17 at 02:24

2 Answers2

2

You can use the ymd() function in the lubridate package to convert the dates, and the empty strings will be converted to NA. For example,

> library(lubridate)
> (newdates <- ymd(dates))
   [1] NA           "2016-12-20" "2016-12-27" "2017-01-07" "2017-01-27" "2017-02-01" "2017-02-02" "2017-02-05"
   [9] "2017-02-06" "2017-02-07" "2017-02-08" "2017-02-10" "2017-02-11" "2017-02-13" "2017-02-14" "2017-02-15"
   [17] "2017-02-17" "2017-02-16" "2017-02-24" "2017-02-19" "2017-02-21" "2017-02-20" "2017-02-26" "2017-02-22"
   [25] "2017-03-02" "2017-02-25" "2017-02-28" "2017-03-01" "2017-03-04" "2017-03-05" "2017-03-06" "2017-03-10"
   [33] "2017-03-08" "2017-03-09" "2017-03-11" "2017-03-12" "2017-03-13" "2017-03-15" "2017-03-29" "2017-05-13"
   [41] "2015-10-05" "2016-02-22" "2015-03-06" "2015-03-07" "2015-10-15" "2015-03-09" "2016-01-30" "2015-10-29"
   [49] "2015-10-24" "2015-10-17" "2016-01-08" "2015-09-24" "2016-02-15" "2015-12-08" "2015-12-10" "2016-02-06"
   [57] "2015-11-29" "2016-01-23" "2015-10-11" "2016-02-16" "2015-09-28" "2016-01-29" "2015-11-27" "2015-10-12"
   [65] "2015-11-01" "2015-11-16" "2015-10-10" "2015-11-30" "2016-01-02" "2016-01-21" "2016-04-22" "2015-10-21"
   [73] "2015-11-12" "2015-12-28" "2015-12-30" "2015-11-06" "2015-10-08" "2015-12-06" "2016-01-24" "2016-01-17"
   [81] "2016-02-26" "2016-03-06" "2016-02-17" "2016-01-11" "2015-12-03" "2016-02-11" "2015-11-22" "2015-10-02"
   [89] "2015-10-03" "2015-11-04" "2016-02-10" "2015-12-09" "2015-10-09" "2015-12-01" "2016-02-25" "2016-01-19"
   [97] "2016-01-18" "2015-12-13" "2016-02-14" "2016-03-10"
> is.Date(newdates)
 [1] TRUE    
jdb
  • 147
  • 7
2

As established in the comments, the answer is that you need to specify the format of the date you're converting. Which in your case is "%Y/%m/%d"

data$Date <- as.Date(data$Date, "%Y/%m/%d")

Explanation

The reason you need to do this is because the first entry in your vector is "", AND you haven't specified the format.

The as.Date function, when applied on a character, first checks to see if the format argument is missing. If it is, it tries to guess the format based on the first element of the vector.

It tests for both "%Y-%m-%d" AND "%Y/%m/%d" formats by doing

xx <- ""
strptime(xx, "%Y-%m-%d")
NA
strptime(xx, "%Y/%m/%d")
NA

More specifically, it uses the following test (where xx is the first element of your vector)

if(is.na(xx) || 
    !is.na(strptime(xx, f <- "%Y-%m-%d", tz = "GMT")) || 
    !is.na(strptime(xx, f <- "%Y/%m/%d", tz = "GMT"))){
  print("success!")     ## I added this print statement for illustration purposes
}else{
  stop("character string is not in a standard unambiguous format")
}

So as you can see, xx evaluates to FALSE in all the if conditions, so the function has to enter the stop method.

To demonstrate, see the results of these statements

as.Date(c("2015/10/5", ""))  
# [1] "2015-10-05" NA
## SUCCESS, because it can 'guess' the first entry's format

as.Date(c("", "2015/10/5"))  
## ERROR: can't 'guess' the first entry's format

as.Date(c("2015/10/5", ""), format = "%Y/%m/%d") 
# [1] "2015-10-05" NA
## SUCCESS, because you've specified the format

as.Date(c("2015-10/5", "")) 
## ERROR: you haven't specified the format, 
## AND it's not one of the 'guessed' options ("%Y-%m-%d", "%Y/%m/%d")
SymbolixAU
  • 25,502
  • 4
  • 67
  • 139