0

I would like to compare price observations inside a rolling time window of 10 minutes.

Let's say I have the following table with prices (left most column) on the products and the time at which they were sold:

data have;
input @1 PRICE 1. @6 PRODUCT $3. @12 DATE_TIME anydtdtm15.;
format date_time datetime17.;
datalines;
1    POW   JAN-01-17 13:00
2    POW   JAN-01-17 13:04
1    POW   JAN-01-17 13:06
2    POW   JAN-01-17 13:15
3    POW   JAN-01-17 13:20
5    POW   JAN-01-17 13:29
1    GAS   JAN-01-17 13:05
2    GAS   JAN-01-17 13:10
1    GAS   JAN-01-17 13:39
;;;;
run;

What I need is a piece of code that will allow me to compare the prices in a 10 minute window. For example: the POW observation at 13:00 has a price of 1 and this price observation should be compared with all other price observations until 15:10 which are the same product. If there's a match on the price in this time window the matched rows should be spit out in a table.

This again has to be done for the observation at 13:04 and a price of 2, which then should be compared to observations until 13:14 and so on..

Can this be done?

double-beep
  • 5,031
  • 17
  • 33
  • 41
jjj
  • 35
  • 7
  • Of course it can be done, but what have you tried ? Did you get errors, combinatoric explosions or missing data ? Why does the 10-minute window starting at 13:00 end at 15:10 ? – Richard Mar 07 '19 at 19:39
  • Sorry, meant 13:10. Well, right now i'm working on adding 10 minutes to the datetime for each row and then joining the table on itself by price and product. Seems to be kinda working so far. Looks a bit clumsy though. – jjj Mar 07 '19 at 19:57
  • This actually solved the problem. Took me a while to get there, but I think I got it now. – jjj Mar 07 '19 at 20:13

1 Answers1

0

Was your solution a SQL reflexive (or self) join ?

In this sample self left join the rows are joined by product and price to enforce the same price search criteria. The join is further tightened by restricting the time frame in which the same prices are accepted as a match. left join is used in case there is no future with same price.

data have;
input @1 PRICE 1. @6 PRODUCT $3. @12 DATE_TIME anydtdtm15.;
format date_time datetime17.;
datalines;
1    POW   JAN-01-17 13:00
2    POW   JAN-01-17 13:04
1    POW   JAN-01-17 13:06
2    POW   JAN-01-17 13:15
3    POW   JAN-01-17 13:20
5    POW   JAN-01-17 13:29
1    GAS   JAN-01-17 13:05
2    GAS   JAN-01-17 13:10
1    GAS   JAN-01-17 13:39
1    GAS   JAN-01-17 13:42
1    GAS   JAN-01-17 13:44
1    GAS   JAN-01-17 13:52
;;;;
run;

proc sql;
  create table want as
  select self.*
  , twin.date_time as same_priced_future_date_time
  , twin.date_time - self.date_time as delta format=time8.
  from have as self
  left join have as twin
  on self.product = twin.product
  and self.price = twin.price
  and self.date_time < twin.date_time
  and twin.date_time - self.date_time <= '00:10:00't
  order by self.product, self.date_time, twin.date_time
  ;

There are other ways using DATA step; be it merge/lag, DOW loops, or hash

Richard
  • 25,390
  • 3
  • 25
  • 38
  • Sorry, I'm not sure what SQL reflexive means. What I did was make a new datetime variable which is the original datetime plus 10 minutes and then join the dataset on itself in a proc sql where one of the conditions for the join was a between (datetime and datetime plus 10) statement. This gave me the pairing that I needed. – jjj Mar 08 '19 at 21:12
  • *join the dataset on itself* - that's reflexive :). So is `have as self left join have as twin` in sample above. – Richard Mar 08 '19 at 21:22