27

I've read the lubridate package manual and have queried Stack Overflow with a variety of permutations of my question but have come up with no answer to my specific problem.

What I'm trying to do is calculate age in months at time of event as the difference between date of birth and some specific event date.

As such, I imported a SAS dataset using the sas7bdat package and converted my SAS date variables (DOB and Event) to R objects using the following code:

df$DOB <- as.Date(df$DOB, origin="1960-01-01")
df$DOB1 <- ymd(df$DOB)

And same thing for the Event variable:

df$Event <- as.Date(df$Event, origin="1960-01-01")
df$Event1 <- ymd(df$Event)

However, there are some NA values for DOB. So, for the following code which I want to use to calculate age (in months).

df$interval <- new_interval(df$DOB1,df$Event1)
df$Age1 <- df$interval %/% months(1)

I'm receiving the error:

Error in est[start + est * per < end] <- est[start + est * per < end] + : NAs are not allowed in subscripted assignments

What am I doing wrong? I've tried an if/else function but perhaps used it incorrectly.

(Note: For the SAS programmers out there, I'm trying to produce the same results as the following function:

IF DOB ne . THEN Tage=Floor(intck('month',DOB,Event)-(Day(Event)<Day(DOB)));
Bugs
  • 4,491
  • 9
  • 32
  • 41
mcjudd
  • 1,520
  • 2
  • 18
  • 33
  • If you created a df with DOBs, then your code would be reproducible, and we can try to find the error more easily. As is, I cannot reproduce the error. When I do `new_interval(NA,"1962-01-01") %/% months(1)` I just get `NA`. – mgriebe Aug 18 '14 at 19:14
  • I found this answer to be more useful than any below: https://stackoverflow.com/questions/42994272/r-lubridate-convert-period-into-numeric-counting-months – MokeEire Oct 22 '21 at 18:40

4 Answers4

52

Simple example using lubridate package

library(lubridate)
date1='20160101'
date2='20160501'
x=interval(ymd(date1),ymd(date2))
x= x %/% months(1)
print(x)
# answer : 4

or follows is same:

x=as.period(x) %>% month()
print(x)
# answer : 4
hyunwoo jeong
  • 1,534
  • 1
  • 15
  • 14
  • 5
    what does `%/%` do? Is that part of the lubridate package? – rrr Aug 01 '18 at 16:38
  • 1
    `%/%` seems to be just rounding down to the nearest integer? Is it any different than `floor(x / months(1))`? And then my real question, could you do `ceiling(x / months(x))` to get round-up behavior? – rrr Aug 01 '18 at 16:49
  • 4
    the second solution may lead to an incorrect answer if the interval is > 12 months. Essentially, it will read it as `'y' years, 'm' months...` and return `m months`, whilst the correct answer should be `y*12 + m`. – skoh Jan 11 '19 at 15:46
  • You can also do `as.integer(ymd(date2)-ymd(date1))` to get the number of days between the two dates. – passerby51 Apr 06 '20 at 06:01
  • This solution is very strange. For example interval(ymd("2020-01-31"),ymd("2020-03-30")) %/% months(1) gives 1 month, which is clearly not right. – skan Feb 25 '22 at 10:08
  • @skan interval(ymd("2020-01-31"),ymd("2020-03-30")) %/% months(1) outputs one because only one full month passed between the dates. If you change to interval(ymd("2020-01-31"),ymd("2020-03-31")) %/% months(1), then the output is 2 because two full months passed. There are 31 days in March. – Susan Switzer Jun 20 '22 at 13:56
10

Well so I give all credit for this answer to my talented work colleague. I neglected to include a reproducible example because whenever I would write a simple approximation of my problem, the df$Age1 <- df$interval %/% months(1) always worked! This left me totally stumped. It wasn't until I actually ran the code on my dataframe of 650,000+ birthdates and event dates that the error message...

Error in est[start + est * per < end] <- est[start + est * per < end] + : NAs are not allowed in subscripted assignments

... would even come up! My colleague had the idea to process this calculation iteratively with the following function:

df$Age1 = rep(NA, nrow(df))
for (i in 1:nrow(df)) {
   df$Age1[i]<- df$interval[i] %/% months(1)
                      }
df$Age1[1:15]

Using my dataframe, it became plain to see that this calculation got hung up on row 13!

> df$interval[13]
[1] 1995-10-31 19:00:00 EST--1996-05-26 20:00:00 EDT

So we aren't certain, but maybe the fact that the df$DOB[13] is 10/31 is screwing it up. This sort of problem with the lubridate package has been reported before (i.e., lubridate not being able to divide intervals by a period when one of the dates is at the end of the month):

https://github.com/hadley/lubridate/issues/235

The way we came to a solution was by using as.period and then converting it to months:

df$Age1<- as.period(df$interval)
head(df$Age1)

[1] "1y 2m 26d 0H 0M 0S" "6m 15d 23H 0M 0S"  
[3] "4m 9d 23H 0M 0S"    "3m 19d 23H 0M 0S"  
[5] "3y 0m 25d 0H 0M 0S" "1y 1m 29d 1H 0M 0S"

df$Age1 <- df$Age1 %/% months(1)
head(df$Age1)

[1] 14  6  4  3 36 13
mcjudd
  • 1,520
  • 2
  • 18
  • 33
6

Here is another example of this reported issue with lubridate (1.3.3). Note that there may be different error messages depending on what else is in the dataset, and the issue seems to be dependent on the unit of measure (in my case months worked whereas years did not).

dat <- as.data.frame(list(Start = as.Date(c("1942-08-09", "1956-02-29")),
                          End   = as.Date(c("2007-07-31", "2007-09-13"))))

int0 <- with(dat, new_interval(Start, End))
as.period(int0, unit = "years")
"Error in est[start + est * per > end] <- est[start + est * per > end] -  : 
  NAs are not allowed in subscripted assignments"

int1 <- with(dat[1,], new_interval(Start, End))
as.period(int1, unit = "years")
[1] "64y 11m 22d 0H 0M 0S"

int2 <- with(dat[2,], new_interval(Start, End))
as.period(int2, unit = "years")
"Error in while (any(start + est * per > end)) est[start + est * per >  : 
  missing value where TRUE/FALSE needed"

as.period(int0) %/% years(1)
[1] 64 51

as.period(int0, unit = "months")
[1] "779m 22d 0H 0M 0S" "618m 15d 0H 0M 0S"
JWilliman
  • 3,558
  • 32
  • 36
2

Instead of

df$Age1 <- df$interval %/% months(1)

you can try:

df$Age1 <- NA
df$Age1[!is.na(df$DOB)] <- df$interval[!is.na(df$DOB)] %/% months(1)
mgriebe
  • 908
  • 5
  • 8