4

Is the following actually possible in SQL?

I have some time-series data and I want to extract some entry and exit points based on prices.

Desired output:

enter image description here

Example Data:

enter image description here

SQL Data:

CREATE TABLE Control
    ([PKey] int, [TimeStamp] datetime, [Name] varchar(10), [Price1] float, [Price2] float);

INSERT INTO Control ([PKey], [TimeStamp], [Name], [Price1], [Price2])
VALUES
    (1,'2018-10-01 09:00:00', 'Name1',120, 125),
    (2,'2018-10-01 09:10:00', 'Name1',110, 115),
    (3,'2018-10-01 09:20:00', 'Name1',101, 106),
    (4,'2018-10-01 09:30:00', 'Name1',105, 110),
    (5,'2018-10-01 09:40:00', 'Name1',106, 111),
    (6,'2018-10-01 09:50:00', 'Name1',108, 113),
    (7,'2018-10-01 10:00:00', 'Name1',110, 115),
    (8,'2018-10-01 10:10:00', 'Name1',104, 109),
    (9,'2018-10-01 10:20:00', 'Name1',101, 106),
    (10,'2018-10-01 10:30:00', 'Name1',99, 104),
    (11,'2018-10-01 10:40:00', 'Name1',95, 100),
    (12,'2018-10-01 10:50:00', 'Name1',101, 106),
    (13,'2018-10-01 11:00:00', 'Name1',102, 107),
    (14,'2018-10-01 11:10:00', 'Name1',101, 106),
    (15,'2018-10-01 11:20:00', 'Name1',99, 104),
    (16,'2018-10-01 11:30:00', 'Name1',105, 110),
    (17,'2018-10-01 11:40:00', 'Name1',108, 113),
    (18,'2018-10-01 11:50:00', 'Name1',108, 113),
    (19,'2018-10-01 12:00:00', 'Name1',109, 114),
    (20,'2018-10-01 12:10:00', 'Name1',108, 113),
    (21,'2018-10-01 12:20:00', 'Name1',105, 110),
    (22,'2018-10-01 12:30:00', 'Name1',101, 106),
    (23,'2018-10-01 12:40:00', 'Name1',102, 107),
    (24,'2018-10-01 09:00:00', 'Name2',103, 108),
    (25,'2018-10-01 09:10:00', 'Name2',101, 106),
    (26,'2018-10-01 09:20:00', 'Name2',104, 109),
    (27,'2018-10-01 09:30:00', 'Name2',106, 111),
    (28,'2018-10-01 09:40:00', 'Name2',108, 113),
    (29,'2018-10-01 09:50:00', 'Name2',108, 113),
    (30,'2018-10-01 10:00:00', 'Name2',105, 110),
    (31,'2018-10-01 10:10:00', 'Name2',103, 108),
    (32,'2018-10-01 10:20:00', 'Name2',101, 106),
    (33,'2018-10-01 10:30:00', 'Name2',99, 104),
    (34,'2018-10-01 10:40:00', 'Name2',101, 106),
    (35,'2018-10-01 10:50:00', 'Name2',104, 109),
    (36,'2018-10-01 11:00:00', 'Name2',101, 106),
    (37,'2018-10-01 11:10:00', 'Name2',99, 104),
    (38,'2018-10-01 11:20:00', 'Name2',106, 111),
    (39,'2018-10-01 11:30:00', 'Name2',103, 108),
    (40,'2018-10-01 11:40:00', 'Name2',105, 110),
    (41,'2018-10-01 11:50:00', 'Name2',108, 113),
    (42,'2018-10-01 12:00:00', 'Name2',105, 110),
    (43,'2018-10-01 12:10:00', 'Name2',104, 109),
    (44,'2018-10-01 12:20:00', 'Name2',108, 113),
    (45,'2018-10-01 12:30:00', 'Name2',110, 115),
    (46,'2018-10-01 12:40:00', 'Name2',105, 110)
;

What have I tried:

I am able to get the first instance of an entry and exit point using the following query which finds the first entry point PKey and then finds the first exit point after the entry point PKey

declare @EntryPrice1 float = 101.0; -- Entry when Price1 <= 101.0 (when not already Entered)
declare @ExitPrice2 float = 113.0; -- Exit when Price2 >= 113.0 (after Entry only)

select 
 t1.[Name]
,t2.[Entry PKey]
,min(case when t1.[Price2] >= @ExitPrice2 and t1.[PKey] > t2.[Entry PKey] then t1.[PKey] else null end) as [Exit PKey]
from [dbo].[Control] t1
left outer join
(select min(case when [Price1] <= @EntryPrice1 then [PKey] else null end) as [Entry PKey]
,[Name]
from [dbo].[Control]
group by [Name]) t2
on t1.[Name] = t2.[Name]
group by t1.[Name],t2.[Entry PKey]

--Name  Entry PKey  Exit PKey
--Name1     3       6
--Name2     25      28

enter image description here

I'm stuck on the approach to use that will allow multiple entry/exit points to be returned and not sure if it's even possible in SQL.

The logic for entry an exit points are

Entry - when price1 <= 101.0 and not already in an entry that has not exited.

Exit - when price2 >= 113.0 and inside an entry.

2 Answers2

1

It's a kind of gaps and islands problem, this is a generic solution using Windowed Aggregates (should work for most DBMSes):

declare @EntryPrice1 float = 101.0; -- Entry when Price1 <= 101.0 (when not already Entered)
declare @ExitPrice2 float = 113.0; -- Exit when Price2 >= 113.0 (after Entry only)

WITH cte AS 
 ( -- apply your logic to mark potential entry and exit rows
   SELECT *
     ,CASE WHEN Price1 <= @EntryPrice1 THEN Timestamp END AS possibleEntry 
     ,CASE WHEN Price2 >= @ExitPrice2  THEN Timestamp END AS possibleExit
     ,Max(CASE WHEN Price1 <= @EntryPrice1 THEN Timestamp END) -- most recent possibleEntry
      Over (PARTITION BY Name
            ORDER BY Timestamp
            ROWS Unbounded Preceding) AS lastEntry 
     ,Max(CASE WHEN Price2 >= @ExitPrice2 THEN Timestamp END) -- most recent possibleExit
      Over (PARTITION BY Name
            ORDER BY Timestamp
            ROWS BETWEEN Unbounded Preceding AND 1 Preceding) AS lastExit
   FROM [dbo].[Control] 
 )
-- SELECT * FROM cte ORDER BY Name, PKey
,groupRows AS 
 ( -- mark rows from the 1st entry to the exit row
   SELECT *
     -- if lastEntry <= lastExit we're after an exit and before an entry -> don't return this row
     ,CASE WHEN lastEntry <= lastExit THEN 0 ELSE 1 END AS returnFlag
     -- assign the same group number to consecutive rows in group 
     ,Sum(CASE WHEN lastEntry <= lastExit THEN 1 ELSE 0 END)
      Over (PARTITION BY Name
            ORDER BY Timestamp
            ROWS Unbounded Preceding) AS grp
   FROM cte
   WHERE (possibleEntry IS NOT NULL OR possibleExit IS NOT NULL)
     AND lastEntry IS NOT NULL
 )
-- SELECT * FROM groupRows ORDER BY  Name, PKey
,rowNum AS
 ( -- get the data from the first and last row of an entry/exit group
   SELECT *
     -- to get the values of the 1st row in a group
     ,Row_Number() Over (PARTITION BY Name, grp ORDER BY Timestamp) AS rn
     -- to get the values of the last row in a group
     ,Last_Value(Price2)
      Over (PARTITION BY Name, grp
            ORDER BY Timestamp
            ROWS BETWEEN Unbounded Preceding AND Unbounded Following) AS ExitPrice
     ,Last_Value(possibleExit)
      Over (PARTITION BY Name, grp
            ORDER BY Timestamp
            ROWS BETWEEN Unbounded Preceding AND Unbounded Following) AS ExitTimestamp
     ,Last_Value(CASE WHEN possibleExit IS NOT NULL THEN PKey END)
      Over (PARTITION BY Name, grp
            ORDER BY Timestamp
            ROWS BETWEEN Unbounded Preceding AND Unbounded Following) AS ExitPKey
   FROM groupRows
   WHERE returnFlag = 1
 )
SELECT Name
  ,Price1 AS EntryPrice
  ,ExitPrice
  ,Timestamp AS EntryTimestamp
  ,ExitTimestamp
  ,PKey AS EntryPKey
  ,ExitPKey
FROM rowNum
WHERE rn = 1 -- return 1st row of each group
ORDER BY Name, Timestamp

See dbfiddle

Of course it might be possible to simplify the logic or apply some proprietary SQL Server syntax...

dnoeth
  • 59,503
  • 4
  • 39
  • 56
  • Excellent, thank you. I need to spend some time examining how this works. –  Oct 01 '18 at 19:35
0

This is a weird form of gaps-and-islands. Start with the very basic definitions of entry and exit:

  select c.*,
         (case when [Price1] <= @EntryPrice1 then 1 else 0 end) as is_entry,
         (case when [Price2] >= @ExitPrice2 then 1 else 0 end) as is_exit
  from control c;

This doesn't quite work because two adjacent "entries" count only as a single entry. We can get the information we need by looking at the previous entry/exit time. With that logic, we can determine which entries are "real". We might as well get the next exit time as well:

with cee as (
      select c.*,
             (case when [Price1] <= @EntryPrice1 then 1 else 0 end) as is_entry,
             (case when [Price2] >= @ExitPrice2 then 1 else 0 end) as is_exit
      from control c
     ),
     cp as (
      select cee.*,
             max(case when is_entry = 1 then pkey end) over (partition by name order by timestamp rows between unbounded preceding and 1 preceding) as prev_entry,
             max(case when is_exit = 1 then pkey end) over (partition by name order by timestamp) as prev_exit,
             min(case when is_exit = 1 then pkey end) over (partition by name order by timestamp desc) as next_exit
      from cee
     )

Next, use this logic to generate a cumulative sum of real entries, and then do some fancy filtering:

with cee as (
      select c.*,
             (case when [Price1] <= @EntryPrice1 then 1 else 0 end) as is_entry,
             (case when [Price1] >= @ExitPrice1 then 1 else 0 end) as is_exit
      from control c
     ),
     cp as (
      select cee.*,
             max(case when is_entry = 1 then pkey end) over (partition by name order by timestamp rows between unbounded preceding and 1 preceding) as prev_entry,
             max(case when is_exit = 1 then pkey end) over (partition by name order by timestamp) as prev_exit,
             min(case when is_exit = 1 then pkey end) over (partition by name order by timestamp desc) as next_exit
      from cee
     )
select *
from cp
where cp.is_entry = 1 and
      (prev_entry is null or prev_exit > prev_entry)

This gives you the rows where the entry starts. You can join in to get the additional information you want.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • The columns `prev_entry`, `prev_exit` and `next_exit`do not always give the correct Pkey.For example, for Pkey 3 I would expect prev_entry = null but it's 25? –  Oct 01 '18 at 19:53
  • @pathDongle . . . If `timestamp` and `pkey` are ordered the same way (as they are for the sampled data), I don't see how that could happen. – Gordon Linoff Oct 01 '18 at 20:00
  • The `timestamp` and `pkey` are in the order decribed. I think the issue is that there are two prices rather than one. The exit condition should be `(case when [Price2] >= @ExitPrice2 then 1 else 0 end) as is_exit`. Also, entry point at PKey=3 is missing –  Oct 01 '18 at 20:23
  • @pathDongle . . . Oh, if there are two prices you care about then the flags should be set accordingly. – Gordon Linoff Oct 01 '18 at 20:30
  • Sorry to be a pain. I still can't get it working, PKey 3 is an entry but the query gives prev_entry = 25 which cannot be correct, it should be null. –  Oct 01 '18 at 20:40
  • @pathDongle . . . Your question doesn't seem to mention it, but you want to do this separately for each name. – Gordon Linoff Oct 01 '18 at 22:14