1

I have the following situation, where I have some individuals with a start and an end date:

ID | start_date | end_date
1    2015-02-15   2015-04-20
2    2015-03-10   2015-06-15
...   ...            ...

Now, I need to derive a table with the individuals and all consecutive 30 day periods between their start and end dates (starting from the start_date). The result should look like this:

ID | period | from_date   | to_date
1     1       2015-02-15    2015-03-17
1     2       2015-03-18    2015-04-17
2     1       2015-03-10    2015-04-09
2     2       2015-04-10    2015-05-10
2     3       2015-05-11    2015-06-10

Do you have any idea how to create such a table in a clever way in MySQL? If MySQL is too cumbersome for such a data manipulation, R or Excel will work for me as well.

tholor
  • 55
  • 1
  • 6
  • Up to how many 30 day time periods will there be? – Kickstart Aug 20 '15 at 10:35
  • If excel works for you then you can just do a simple date sort in excel and identify 30 day periods and consecutive days by `=datedif(from_date,to_date,'d')` – Gaurav Aug 20 '15 at 10:36
  • The maximum of 30 day time periods for a single individual is 495, but on average an individual has 30 of these periods. – tholor Aug 20 '15 at 10:51

3 Answers3

1

You can generate a range of numbers, then cross join that range with all the records, adding as many groups of 30 days to that row as the number returned.

Something like this (not tested so please excuse any typos):-

SELECT a.id, b.aNum, DATE_ADD(a.start_date, INTERVAL (b.aNum * 30) DAY) AS from_date, DATE_ADD(a.start_date, INTERVAL ((b.aNum + 1) * 30) DAY) AS to_date
FROM sometable a
CROSS JOIN
(
    SELECT tens.aCnt * 10 + units.aCnt AS aNum
    FROM
    (SELECT 1 AS aCnt UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 0) units
    CROSS JOIN
    (SELECT 1 AS aCnt UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 0) tens
) b
WHERE  DATE_ADD(a.start_date, INTERVAL (b.aNum * 30) DAY) <= end_date

This version only works for up to 100 groups of 30 days but can easily be expanded (but will get slower the more groups you cope with)

Kickstart
  • 21,403
  • 2
  • 21
  • 33
1

Tricky question I must say.

Here's my attempt using the data.table package in R. First, I'll make sure you have a proper date formats in your data

library(data.table)
indx <- grep("date", names(df))
setDT(df)[, (indx) := lapply(.SD, as.Date), .SDcols = indx]

Then, we will compute a 30 day intervals per ID while adding cumulative index both to start and end columns

df[, 
     {
      temp <- seq.Date(start_date, end_date, by = "30 days")
      indx <- seq_along(temp[-(1L:2L)])
      .(
        Period = c(indx, length(temp) - 1L),
        from = c(temp[1L], temp[-c(1L, length(temp))] + indx),
        to = c(temp[2L], temp[-c(1L:2L)] + indx)
       )
      }
, by = ID]

#    ID Period       from         to
# 1:  1      1 2015-02-15 2015-03-17
# 2:  1      2 2015-03-18 2015-04-17
# 3:  2      1 2015-03-10 2015-04-09
# 4:  2      2 2015-04-10 2015-05-10
# 5:  2      3 2015-05-11 2015-06-10
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
0

I slightly adjusted Kickstart's code to solve all my requirements from the original post, maybe it helps somebody with a similar problem:

SELECT a.pid, b.aNum+1 as period, DATE_ADD(a.start_date, INTERVAL (b.aNum * 31) DAY) AS from_date, 
DATE_ADD(DATE_ADD(a.start_date, INTERVAL (b.aNum * 31) DAY), INTERVAL 30 DAY) AS to_date
FROM any_table a
CROSS JOIN
(
SELECT hundreds.aCnt*100 + tens.aCnt * 10 + units.aCnt AS aNum
FROM
(SELECT 1 AS aCnt UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 0) units
CROSS JOIN
(SELECT 1 AS aCnt UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 0) tens
 CROSS JOIN
(SELECT 1 AS aCnt UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 0) hundreds
) b
WHERE  DATE_ADD(a.start_date, INTERVAL (b.aNum * 30)+30 DAY) <= end_date

Now, the next period starts one day after the previous ends and the last 30 day period of an individual ends before the end_date.

tholor
  • 55
  • 1
  • 6
  • You might need to use (b.aNum * 30) + 1 to add a multiple of 30 days plus one day. Ie, think you want 31 days, 61 days, 91 days, etc, in the future. While your code (and a bit of mine due to a typo) would give you 31, 62, 93, etc days in the future. – Kickstart Aug 20 '15 at 11:21
  • No, I do want 31,62, 93 etc. because the new period should always start one day after the previous period ends (-> no overlapping). – tholor Aug 21 '15 at 02:14