2

I have a dataframe of letters and dates:

Dates <- data.frame(X = c("A", "B", "C", "D"), Y = c("1/1/1988","1/1/2000","11/1/1996", "2/1/1990"))
Dates$Y <- as.Date(Dates$Y, "%m/%d/%Y")

I'm trying to turn this data frame into a symmetrical matrix where the values in the matrix are the absolute difference (in years) between the dates of all the possible combinations of letters. So the output would look like this:

Output <- matrix(c(0, 12.01, 8.84, 12.01, 0, 3.17, 8.84, 3.17, 0), nrow=3, ncol=3,
            dimnames = list(c("A", "B", "C"),
                            c("A", "B", "C")))

Thank you so much in advance!

1 Answers1

2

We can use outer along with our custom function to calculate date difference in years.

outer(Dates$Y, Dates$Y, FUN = function(x,y) 
     round(abs(as.numeric(difftime(x, y, unit="weeks"))/52.25), 2))

#      [,1]  [,2] [,3] [,4]
#[1,]  0.00 11.98 8.82 2.08
#[2,] 11.98  0.00 3.16 9.90
#[3,]  8.82  3.16 0.00 6.74
#[4,]  2.08  9.90 6.74 0.00

The code to calculate date difference in years is taken from here.


As @thelatemail mentioned in comments that it could be more efficient (as well as tidy) if we remove the abs, division and round function outside of outer

abs(outer(Dates$Y, Dates$Y, difftime, units="weeks") / 52.25)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 2
    It is probably more efficient to do the `abs/division` outside of the `outer` call once you already have the matrix - `abs(outer(Dates$Y, Dates$Y, difftime, units="weeks") / 52.25)` for instance. Avoids the need for the anonymous function too. – thelatemail Apr 24 '18 at 04:25
  • @RonakShah that's fantastic! Thank you so much! That is extremely helpful. – user9351962 Apr 24 '18 at 20:26