1

I have a timetable in SQL Server that has the [SERV_ID] (service-id), [STATION] (station), [ARR] (arrivaltime), [DEP] (departuretime) of a public transport vehicle. Every Service can be present every day [SERV_DAY].

Target is to summarize Serviceday, Service-line, First-station, Last-station, and the corresponding timestamps. --> One row per service per day.

For [SERV_ID] N170 this would be:

SERV_DAY                SERV_ID     FIRST_STATION   MIN_DEP                 LAST_STATION        MAX_ARR
2019-08-14 00:00:00     N170        Downtown        2019-08-14 06:06:00     CentralStation      2019-08-14 07:11:00

I tried to do this by partinioning thru ([SERV_DAY], [SERV_ID]) an then get MAX([ARR]) and MIN([DEP]) for each partition. This works so long, but now I want to get the corresponding Station to each Min and Max.

SELECT 
       [SERV_DAY],[SERV_ID],
       MAX([ARR]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MAX_ARR,
       MIN([DEP]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MIN_DEP 
FROM #demo

Later I need to add the delay at the last station, which is available in an extended version of the dataset as [ARR_EFFECTIVE] and [DEP_EFFECTIVE]. Hopefully I will be able to do add these attributes as soon as I know how to summarize the daily lines as described above.

This topic is close but I do not get how to adapt the "gap & island problem" Min() and Max() based on partition in sql server

I have set up a demo dataset in dbfiddle https://dbfiddle.uk/?rdbms=sqlserver_2016&fiddle=52e53d43a49ddb8f67454e576bfa7d74

Can anyone help me to finalize the query?

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
Sam Amani
  • 85
  • 7

4 Answers4

1
SELECT 
       [SERV_DAY]
       ,[SERV_ID],
       FIRST_VALUE(STATION) over (Partition by [SERV_DAY],[SERV_ID] Order by ARR DESC) Station1
       , FIRST_VALUE(STATION) over (Partition by [SERV_DAY],[SERV_ID] Order by DEP ASC) Station2
FROM #demo
eshirvana
  • 23,227
  • 3
  • 22
  • 38
  • 1
    pay attention to that technique. Because when taking `first_value` of more than one column, those values might come from different rows. And so one column would not match the others. That may happen if more than one row provides the same rank with the order (here `ARR` and `DEP`). That's also the reason why I'm using row_number instead of dense_rank for the filtering. – casenonsensitive Sep 28 '20 at 19:31
  • 1
    @casenonsensitive I'm sorting by Datetime datetype ( Arr and Dep) , and based on the scenario , we can not have the same arrival time (ARR) or departure time (Dep) whiting one day , so this is not the case here. – eshirvana Sep 28 '20 at 20:10
0

You can do that in two steps: first add a row_number sorted by ARR descending and another row_number sorted by dep. Then you're able to filter on the rows with row_number = 1 in order to select other columns. Here's an example how to retrieve the station of the max_arr and the min_dep:

WITH T AS (
SELECT 
       [SERV_DAY], [SERV_ID],
       MAX([ARR]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MAX_ARR,
       MIN([DEP]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MIN_DEP,
       ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [ARR] DESC) AS RN_ARR,
       ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [DEP]) AS RN_DEP,
       *
  FROM #demo
)
SELECT MAX(CASE WHEN RN_ARR = 1 THEN [STATION] END) MAX_ARR_STATION,
       MAX(CASE WHEN RN_DEP = 1 THEN [STATION] END) MIN_DEP_STATION,
       *
  FROM T
casenonsensitive
  • 955
  • 2
  • 9
  • 18
  • Thank you! Unfortunately `@SELECT [SERV_DAY], [SERV_ID], MAX([ARR]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MAX_ARR, MIN([DEP]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MIN_DEP, ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [ARR] DESC) AS RN_ARR, ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [DEP]) AS RN_DEP, * FROM #demo` Won't fit in a temptable or with as it has `[SERV_DAY], [SERV_ID]` as duplicated columns. is there a way to individually rename one version? – Sam Amani Sep 28 '20 at 16:57
  • @SamAmani instead of the `*` simply write all the columns you need to select. That way you'll ensure that no column name is used twice. – casenonsensitive Sep 28 '20 at 18:50
0

As reply to @casenonsensitive it works using his code and a little modification!

WITH T AS (

SELECT 
       [SERV_DAY], [SERV_ID], [STATION],
       MAX([ARR]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MAX_ARR,
       MIN([DEP]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MIN_DEP,
       ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [ARR] ) AS RN_ARR,
       ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [DEP] ) AS RN_DEP
  FROM #demo
  )
  
 SELECT MAX(CASE WHEN RN_ARR = 1 THEN [STATION] END) MIN_DEP_STATION,
       MAX(CASE WHEN RN_DEP = 1 THEN [STATION] END) MAX_ARR_STATION, [SERV_DAY], [SERV_ID], MAX_ARR, MIN_DEP from T
       group by [SERV_DAY], [SERV_ID], MIN_DEP, MAX_ARR
Sam Amani
  • 85
  • 7
0

I think I would use a temp table instead of a CTE if you have a large amount of data, but here is a quick idea on how that should work:

WITH CTE AS
(
    SELECT * 
    , ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY ARR  ) RN
    , ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY DEP  ) RN2
    from #demo
)
SELECT t1.[SERV_DAY],t1.[SERV_ID],t1.[STATION] FIRST_STATION, t1.[DEP] MIN_DEP, t2.STATION LAST_STATION
FROM CTE t1
INNER JOIN CTE t2 on t1.SERV_DAY = t2.SERV_DAY and t1.SERV_ID = t2.SERV_ID and t2.RN2 = 1
WHERE t1.RN = 1
JMabee
  • 2,230
  • 2
  • 9
  • 13
  • Thank you! This really seems to be working perfectly as this query is easily expandable to effective arrival, departure for later calculation of the delay! – Sam Amani Sep 29 '20 at 05:28
  • Awesome, if it meets you needs please mark it as you accepted answer. – JMabee Sep 29 '20 at 10:47
  • SO Sry, I'm not allowed to mark the answer on this crappy platform; Thanks for the feedback! Votes cast by those with less than 15 reputation are recorded, but do not change the publicly displayed post score – Sam Amani Oct 01 '20 at 05:23