2

I have a table that needs to be split on the basis of datetime

Input Table

ID|    Start              |    End
--------------------------------------------
A |    2019-03-04 23:18:04|    2019-03-04 23:21:25
--------------------------------------------
A |    2019-03-04 23:45:05|    2019-03-05 00:15:14
--------------------------------------------

Required Output

ID|    Start              |    End
--------------------------------------------
A |    2019-03-04 23:18:04|    2019-03-04 23:21:25
--------------------------------------------
A |    2019-03-04 23:45:05|    2019-03-04 23:59:59
--------------------------------------------
A |    2019-03-05 00:00:00|    2019-03-05 00:15:14
--------------------------------------------

Thanks!!

ExelIQ
  • 49
  • 7
  • What about the time 23:59:59.001 to 23:59:59.999? – S-Man Jul 22 '19 at 08:54
  • The data format only has a resolution up to the second. – ExelIQ Jul 22 '19 at 08:58
  • Is it possible that the range is above more than 2 dates? maybe 2019-03-05 - 2019-03-08? – S-Man Jul 22 '19 at 09:22
  • What if the interval crosses more than one day? e.g. `(start, end)` = `(2019-01-01 00:00:00, 2019-01-05 00:00:00)`? –  Jul 22 '19 at 09:24
  • I checked the table doesn't seem to have a range over two days. You make a good point, it should deal with such occurrences if they do appear in the future. – ExelIQ Jul 22 '19 at 09:46

4 Answers4

1

Try this below code. This will only work if the start and end date fall in two consecutive day. Not if the start and end date difference is more than 1 day.

MSSQL:

SELECT ID,[Start],[End]
FROM Input_Table A
WHERE DATEDIFF(DD,[Start],[End]) = 0

UNION ALL


SELECT ID,[Start], CAST(CAST(CAST([Start] AS DATE)  AS VARCHAR(MAX)) +' 23:59:59' AS DATETIME)
FROM Input_Table A
WHERE DATEDIFF(DD,[Start],[End]) > 0

UNION ALL

SELECT ID,CAST(CAST([End] AS DATE) AS DATETIME),[End]
FROM Input_Table A
WHERE DATEDIFF(DD,[Start],[End]) > 0

ORDER BY 1,2,3

PostgreSQL:

SELECT ID,
TO_TIMESTAMP(startDate,'YYYY-MM-DD HH24:MI:SS'),
TO_TIMESTAMP(endDate, 'YYYY-MM-DD HH24:MI:SS')
FROM mytemp A
WHERE DATE_PART('day', endDate::date) - 
    DATE_PART('day',startDate::date) = 0

UNION ALL


SELECT ID,
TO_TIMESTAMP(startDate,'YYYY-MM-DD HH24:MI:SS'), 
TO_TIMESTAMP(CONCAT(CAST(CAST (startDate AS DATE) AS VARCHAR) , 
    ' 23:59:59') , 'YYYY-MM-DD HH24:MI:SS')
FROM mytemp A
WHERE DATE_PART('day', endDate::date) - 
    DATE_PART('day',startDate::date) > 0

UNION ALL


SELECT ID,
TO_TIMESTAMP(CAST(CAST (endDate AS DATE) AS VARCHAR) ,
    'YYYY-MM-DD HH24:MI:SS')    ,
TO_TIMESTAMP(endDate,'YYYY-MM-DD HH24:MI:SS')
FROM mytemp A
WHERE DATE_PART('day', endDate::date) - 
    DATE_PART('day',startDate::date) > 0;

PostgreSQL Demo Here

mkRabbani
  • 16,295
  • 2
  • 15
  • 24
  • All refer to Input_Table A, is there something im missing? – ExelIQ Jul 22 '19 at 09:52
  • No, its all input from Input_Table A. – mkRabbani Jul 22 '19 at 09:53
  • Postgres version added to the answer. – mkRabbani Jul 22 '19 at 11:04
  • All those casts and `to_timestamp()` calls seem rather useless if those two columns are defined as `timestamp` (and who in their right mind would store timestamps in a varchar column). Also `DATE_PART('day', endDate::date) - DATE_PART('day',startDate::date) = 0` can be simplified to `enddate::date - starddate::date) = 0` or simply `enddate::date = startdate::date` –  Jul 22 '19 at 11:42
1

demo:db<>fiddle

This works even when range crosses more than one day

WITH cte AS (
    SELECT
        id,
        start_time,
        end_time,
        gs,
        lag(gs) over (PARTITION BY id ORDER BY gs)                                 -- 2
    FROM
        a
    LEFT JOIN LATERAL
        generate_series(start_time::date + 1, end_time::date, interval '1 day') gs  --1
    ON TRUE
)
SELECT                                                                             -- 3
    id,
    COALESCE(lag, start_time) AS start_time,
    gs - interval '1 second'
FROM
    cte
WHERE gs IS NOT NULL

UNION

SELECT DISTINCT ON (id)                                                           -- 4
    id,
    CASE WHEN start_time::date = end_time::date THEN start_time ELSE end_time::date END,  -- 5
    end_time
FROM
    cte
  1. CTE: the generate_series function generates one row per day new day. So, there is no value if there is no date change
  2. CTE: the lag() window function allows to move the current date value into the next row (the current end is the next start)
  3. With this data set you can calculate the new start and end values. If there is no gs value: There is no date change. This ignored at this point. For all cases with date changes: If there is no lag value, it is the beginning (so it cannot got a previous value). In this case, the normal start_time is taken, otherwise it is a new day which takes the date break time. The end_time is taken with the last second of the day (interval - '1 second')
  4. The second part: Because of the date breaks there is always one additional record which need to be unioned. The last record is from the beginning of the end_time (so cast to date). The CASE clause combines this step with the case of no date change which has been ignored so far. So if start_time and end_time are at the same date, here the original start_time is taken.
S-Man
  • 22,521
  • 7
  • 40
  • 63
  • Hi, CTE doesnt seem to work on my IDE, which is Postgres. – ExelIQ Jul 22 '19 at 09:45
  • What version are you using? – S-Man Jul 22 '19 at 09:47
  • Im using workbench/j(build 124) and connected to a Postgres DB through AWS. – ExelIQ Jul 22 '19 at 09:55
  • That does not help me ;) – S-Man Jul 22 '19 at 10:05
  • I received the following error - Invalid operation: syntax error at or near "::" for code --- generate_series(utc_tune_start_time::date + 1, utc_tune_end_time::date, interval '1 day') gs – ExelIQ Jul 22 '19 at 10:08
  • CTE was used as a solution for a similar problem. Im not sure if CTE is supported as a function. Yes, I am using Amazon Redshift. – ExelIQ Jul 22 '19 at 10:10
  • OK, the problem is not the CTE, but the generate_series function with date parameters. This is not supported by Redshift. You should always name your database system in your question! – S-Man Jul 22 '19 at 10:13
  • 1
    @JASONMENDES As all answers solve the problem that you mentioned in the question (which not included the Redshift problem at the beginning), it would be great if you upvote all of them! Upvotes honor the time and work the repliers invested into your question. If one one of them completely solves your problem, don't forget to accept this as well! – S-Man Jul 22 '19 at 10:18
  • 1
    @JASONMENDES: you should have mentioned from the start that you are using Amazon Redshift, not Postgres. Although they share some ancient roots they are [quite different](http://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-features.html) –  Jul 22 '19 at 10:19
0

Simulate loop for interval generation using recursive CTE, i.e. take range from start to midnight in seed row, take another day in subsequent rows etc.

with recursive input as (
  select 'A' as id, timestamp '2019-03-04 23:18:04' as s, timestamp '2019-03-04 23:21:25' as e union
  select 'A' as id, timestamp '2019-03-04 23:45:05' as s, timestamp '2019-03-05 00:15:14' as e union
  select 'B' as id, timestamp '2019-03-06 23:45:05' as s, timestamp '2019-03-08 00:15:14' as e union
  select 'C' as id, timestamp '2019-03-10 23:45:05' as s, timestamp '2019-03-15 00:15:14' as e
), generate_id as (
  select row_number() over () as unique_id, * from input
), rec (unique_id, id, s, e) as (
  select unique_id, id, s, least(e, s::date::timestamp + interval '1 day')
  from generate_id seed
  union
  select remaining.unique_id, remaining.id, previous.e, least(remaining.e, previous.e::date::timestamp + interval '1 day')
  from rec as previous
  join generate_id remaining on previous.unique_id = remaining.unique_id and previous.e < remaining.e
)
select id, s, e from rec
order by id,s,e

Note:

  • your id column appears not to be unique, so I added custom unique_id column. If id was unique, CTE generate_id was unnecessary. Uniqueness is unavoidable for recursive query to work.
  • close-open range is better for representation of such data, rather than close-close range. So end time in my query returns 00:00:00, not 23:59:59. If it's not suitable for you, modify query as an exercise.

UPDATE: query works on Postgres. OP originally tagged question postgres, then changed tag to redshift.

Tomáš Záluský
  • 10,735
  • 2
  • 36
  • 64
0

Unfortunately, Redshift doesn't have a convenient way to generate a series of numbers. If you table is big enough, you can use it to generate numbers. "Big enough" means that the number of rows is greater than the longest span. Perhaps another table would work, if not this one.

Once you have that, you can use this logic:

with n as (
      select row_number() over () - 1 as n
      from t
     )
select t.id,
       greatest(t.s, date_trunc('day', t.s) + n.n * interval '1 day') as s,
       least(t.e, date_trunc('day', t.s) + (n.n + 1) * interval '1 day' - interval '1 second') as e
from t join
     n
     on t.e >= date_trunc('day', t.s) + n.n * interval '1 day';

Here is a db<>fiddle. It uses an old version of Postgres, but not quite old enough for Redshift.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Could you please explain the idea? The query seems incorrect to me, produces wrong result for expanding range with id=C in https://dbfiddle.uk/?rdbms=postgres_8.4&fiddle=be5799a45a1812907c1953608b191906 (example from my answer). This is why I ended with recursive CTE, which is also standard-compliant solution. – Tomáš Záluský Jul 22 '19 at 11:42
  • @TomášZáluský . . . There are not enough rows in the table to generate enough numbers for that example. That is what the first paragraph is explaining. A recursive CTE is nice, but Redshift does not support that functionality. – Gordon Linoff Jul 22 '19 at 11:57