4

I'm trying to convert a column of dates into strings, because I want to use them as factor levels at some later point in my code.

The date column is part of a tibble, and is of class Date. I figured that a simple as.character() conversion would do the trick, but unfortunately I was wrong. Instead of neatly formatted strings it returns a number in string form. For example today (22 november 2017) would come out as "17492". So somewhere in the process the date gets converted into its numeric format and only then turned into a character string.

Now I did find a workaround, by unlisting the data, converting it again to dates and then to character strings, but it is fairly inefficient.

Can anyone explain i) why this occurs and ii) if there is an easier fix?

Below a reproducible example:

#Get current system date
foo <-Sys.Date()
#Convert to list
foo <- as.list(foo)
#The following then produces the number string:
as.character(foo)
[1] "17492"
#The following code works but is a rather annoying work-around
as.character(as.Date(unlist(foo), origin=as.Date("1970-01-01")))
[1] "2017-11-22"
d.b
  • 32,245
  • 6
  • 36
  • 77
Maarten Punt
  • 256
  • 1
  • 11
  • 1
    `lapply(foo, as.character)` – d.b Nov 22 '17 at 20:15
  • 2
    Is there a reason you are storing your dates in a list rather than a simple vector in your example? Normally columns in tibbles are stored as vectors. Does your reproducible example accurately reflect your real problem? – MrFlick Nov 22 '17 at 20:18
  • 1
    @d.b Great, thanks. Works like a breeze. – Maarten Punt Nov 22 '17 at 20:21
  • 1
    As for the "why", it's a general thing with using `as.character` on lists, not having to do with dates in particular. There's a very good duplicate I'm trying to find... – joran Nov 22 '17 at 20:24
  • If your dates are in column `date` of the tibble `mydata` you can also use `dplyr::mutate`. For example `mydata %>% mutate(datechar = as.character(date))`. – neilfws Nov 22 '17 at 20:25
  • 1
    Based on `?as.character`, it seems like running `as.character` on lists effectively does something like `unlist(lapply(foo, function(x) as.character(deparse(as.vector(x)))))` but I'll let someone more knowledgeable tackle the **why** – d.b Nov 22 '17 at 20:25
  • @MrFlick Aren't tibbles supposed to be lists? I first noted the problem for the tibble but then observed the same behavior for data.frames and lists and thought I'd ask the question in a broader sense. – Maarten Punt Nov 22 '17 at 20:27
  • 1
    @d.b Exactly, the "why" is mostly just because since lists could potentially hold anything as elements, `as.character` doesn't really make much sense in that context. – joran Nov 22 '17 at 20:27
  • @MaartenPunt Mr Flick means why are you storing the dates in a _list column_ rather than date column, specifically. List columns can be awkward. – joran Nov 22 '17 at 20:29
  • 1
    Finally found the question I was [thinking](https://stackoverflow.com/q/7591632/324364) of. – joran Nov 22 '17 at 20:30
  • @MaartenPunt tibbles and data.frames are basically lists, but typically lists of atomic vectors. So a list of vectors, rather than a list of lists. Often you wind up with lists with sloppy data manipulation rather than intentionally trying to have a list. With `tt<-tibble(date=c(Sys.Date(), Sys.Date()))`, you have a list with one column and that one column is an atomic Date vector `class(tt$date)`, not a list. And here `as.character(tt$date)` works just fine. – MrFlick Nov 22 '17 at 20:35
  • @joran I see. The structure of the data is basically inherited from how it was produced. I matched a number of strings with their first occurence in an ordered dataset and then used that to extract the dates from a date column. In any case the example did replicate the problem, and d.b. solution solved it. I'll check out the other question – Maarten Punt Nov 22 '17 at 20:36
  • 2
    Seems like the basic issue is that running `as.character(foo)` where foo is a 1-column tibble is not the same as `as.character(foo$date)`. – neilfws Nov 22 '17 at 20:37
  • 1
    @MrFlick Oh now I see where I went wrong. I did miss the dollar sign. Most probably because the tibble in this case is a single column. I apologize for my stupidity. But at least I learned a lot of new stuff as to why. – Maarten Punt Nov 22 '17 at 20:43

1 Answers1

0

Given the amount of useful comments and the final solutions provided I'll post an answer summary here.

The first thing to do if you run into this problem is check whether you actually want to convert the full list, or a column within the list, with the column actually being a vector. This was my underlying problem as MrFlick and neilfws pointed out. The reason I missed that was because in my case the list was a one column tibble, the column being named "date". Using as.character(foo) returned my "numeric string" "17492", but using as.character(foo$date), did exactly what it was supposed to do and returned "2017-11-22".

In case your list is really just a list, or a list of lists, the solution of d.b. works like a breeze: use lapply(foo, as.character) or sapply(foo, as.character) depending a bit on your desired output.

Now as to the why this happens: the direct reason, as pointed out by d.b. is that if as.character() encounters a list it first unlist() it, and then does the conversion.

The deeper why was pointed out by joran and the duplicate question on that here. In short: usually it does not make sense to convert a full list to a single data type class, as it can can contain many. For example as.numeric(foo) would just return an error. The only exception to that is as.character(), that actually makes a full write-out of the list (perhaps to keep records).

Maarten Punt
  • 256
  • 1
  • 11