1

So I am trying this code, which I have used in the past with other data wrangling tasks with no errors:

## Create an age_at_enrollment variable, based on the start_date per individual (i.e. I want to know an individual's age, when they began their healthcare job).

complete_dataset_1 = complete_dataset %>% mutate(age_at_enrollment = (as.Date(start_date)-as.Date(birth_date))/365.25)

However, I keep receiving this error message: "Error in charToDate(x) : character string is not in a standard unambiguous format"

I believe this error is happening because in the administrative dataset that I am using, the start_date and birth_date variables are formatted in an odd way:

start_date    birth_date
2/5/07 0:00   2/28/1992 0:00

I could not find an answer as to why the data is formatted that, so any thoughts on how to fix this issue without altering the original administrative dataset?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
maldini425
  • 307
  • 3
  • 14

1 Answers1

2

The ambiguity in your call to as.Date is whether the day or month comes first. To resolve this, you may use the format parameter of as.Date:

complete_dataset_1 = complete_dataset
    %>% mutate(age_at_enrollment = (
        as.Date(start_date, format="%m/%d/%Y") -
        as.Date(birth_date, format="%m/%d/%Y")) / 365.25)

A more precise way to calculate the diff in years, handling the leap year edge case, would be to use the lubridate package:

library(lubridate)
complete_dataset_1 = complete_dataset
    %>% mutate(age_at_enrollment = time_length(difftime(
        as.Date(start_date, format="%m/%d/%Y"),
        as.Date(birth_date, format="%m/%d/%Y")), "years")
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thanks a lot! However, I am curious, I am only looking for an age variable that tells me whether an individual is 25 years old, 26 years old, 27 years old etc.. What would be the best way to divide the difference between the two columns? Because right now, the age variable that I just created is telling me an observation's age in days, which is more detailed than what I am looking for. – maldini425 Mar 22 '20 at 04:17
  • 1
    Counting the exact diff in years is tricky. The `lubridate` package offers a way to do it, [see here](https://stackoverflow.com/questions/15569333/get-date-difference-in-years-floating-point). – Tim Biegeleisen Mar 22 '20 at 04:23