1

Re the solution to this question, wouldn't the correlated subquery be executed many times? Can you guys think of any other query that would perform better on really big datasets?

Rather than using joins with corralated subquery, maybe by leveraging LAG/LEAD functions? I'm just wondering.. I would appreciate any input or idea to solve some performance issue I'm facing with similar problem.. I've been searching the net and already tried various solutions like using row_number() for instance but found nothing that would perform any better so far..

CREATE TABLE EXCHANGE_RATES (
VALID_FROM date,
RATE decimal
);

CREATE TABLE ENTRIES (
ID int NOT NULL,
DATE date,
PRIMARY KEY (ID)
);

INSERT INTO EXCHANGE_RATES VALUES
 ('2020-01-01', 101)
 --2020-01-02 missing
,('2020-01-03', 103)
,('2020-01-04', 104)
,('2020-01-05', 105)
 --2020-01-06 missing
 --2020-01-07 missing
 --2020-01-08 missing
,('2020-01-09', 109)
;

INSERT INTO ENTRIES VALUES 
 (1, '2020-01-02')
,(2, '2020-01-03')
,(3, '2020-01-04')
,(4, '2020-01-08')
;

And the current solution:

SELECT
 e.*
,r.RATE
FROM
ENTRIES e
LEFT JOIN EXCHANGE_RATES r
 ON r.VALID_FROM = (select max(VALID_FROM) from EXCHANGE_RATES where VALID_FROM <= e.DATE)
;

Results are fine just looking for a better way if any:

ID  DATE        RATE
1   2020-01-02  101
2   2020-01-03  103
3   2020-01-04  104
4   2020-01-08  105
GMB
  • 216,147
  • 25
  • 84
  • 135
Tom
  • 31
  • 6

3 Answers3

2

An alternative approch uses a lateral join:

select e.*, er.*
from entries e
outer apply (
    select rate, valid_from
    from exchange_rates er
    where er.valid_from <= e.date
    order by er.valid_from desc
    offset 0 rows fetch next 1 row only
) er

This syntax should work in Oracle and SQL Serer, which you both tagged. An index on exchange_rate(valid_from, rate) might help speed up the query.

Demo on DB Fiddlde:

ID | DATE       | rate | valid_from
-: | :--------- | ---: | :---------
 1 | 2020-01-02 |  101 | 2020-01-01
 2 | 2020-01-03 |  103 | 2020-01-03
 3 | 2020-01-04 |  104 | 2020-01-04
 4 | 2020-01-08 |  105 | 2020-01-05
GMB
  • 216,147
  • 25
  • 84
  • 135
1

I use this approach a lot on Teradata where non-equi-joins are not very efficient.

First UNION both tables adding NULLs:

select id
  ,date
  ,NULL as rate
  ,NULL as valid_from
from ENTRIES
union all
select NULL as id
  ,valid_from
  ,rate
  ,valid_from
from EXCHANGE_RATES
order by 2,1

 id   date        rate  valid_from 
      2020-01-01  101   2020-01-01 
 1    2020-01-02                   
      2020-01-03  103   2020-01-03 
 2    2020-01-03                   
      2020-01-04  104   2020-01-04 
 3    2020-01-04                   
      2020-01-05  105   2020-01-05 
 4    2020-01-08                   
      2020-01-09  109   2020-01-09 

Then fill the missing data using LAG:

with union_rows as 
 (
    select id
      ,date
      ,NULL as rate
      ,NULL as valid_from
    from ENTRIES
    union all
    select NULL as id
      ,valid_from
      ,rate
      ,valid_from
    from EXCHANGE_RATES
 )
,fill_data as
 (
   select id, date 
     ,lag(rate,1,rate) 
      over (order by date,id) as rate
     ,max(valid_from) 
      over (order by date,id
            rows unbounded preceding) as valid_from
   from union_rows
 )
select * 
from fill_data

 id   date       rate   valid_from  
      2020-01-01  101   2020-01-01 
 1    2020-01-02  101   2020-01-01 
 2    2020-01-03        2020-01-01 
      2020-01-03        2020-01-03 
 3    2020-01-04  103   2020-01-03 
      2020-01-04        2020-01-04 
      2020-01-05  104   2020-01-05 
 4    2020-01-08  105   2020-01-05 
      2020-01-09        2020-01-09

and finally filter the rows from ENTRIES

 where id is not null

This works well as long as there's only a single row per date/table, i.e. only a single NULL between rows. If there're multiple NULL rows you need LAG's IGNORE NULLS option, which is not supported by SQL Server.

See fiddle

dnoeth
  • 59,503
  • 4
  • 39
  • 56
  • Right, have more nulls/gaps in my live data so I guess would need IGNORE NULLS after all.. I'll try to move my data to Oracle and try this solution out, looks really promosing, thank you! – Tom Sep 19 '20 at 16:53
  • well, tried this approach out with option IGNORE NULLS on my live data, but doesn't seem to perform better in my case unfortunately.. don't know why but it's a cool idea anyway :) – Tom Sep 22 '20 at 07:45
0

The question that you are referencing was asked eight years ago.

In any case lag() and lead() work when the data is in a single table. That question is about asking for the nearest value in another table. The correlated subquery is a reasonable approach.

There are other methods as well -- but those depend on the particular problem.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Right, I was thinking to generate VALID_TO values in the other table first for example so can rely something more than the VALID_FROM dates alone if it makes sense.. – Tom Sep 18 '20 at 16:58