SQL lookup value per closest earlier date in other table

Question

Re the solution to this question, wouldn't the correlated subquery be executed many times? Can you guys think of any other query that would perform better on really big datasets?

Rather than using joins with corralated subquery, maybe by leveraging LAG/LEAD functions? I'm just wondering.. I would appreciate any input or idea to solve some performance issue I'm facing with similar problem.. I've been searching the net and already tried various solutions like using row_number() for instance but found nothing that would perform any better so far..

CREATE TABLE EXCHANGE_RATES (
VALID_FROM date,
RATE decimal
);

CREATE TABLE ENTRIES (
ID int NOT NULL,
DATE date,
PRIMARY KEY (ID)
);

INSERT INTO EXCHANGE_RATES VALUES
 ('2020-01-01', 101)
 --2020-01-02 missing
,('2020-01-03', 103)
,('2020-01-04', 104)
,('2020-01-05', 105)
 --2020-01-06 missing
 --2020-01-07 missing
 --2020-01-08 missing
,('2020-01-09', 109)
;

INSERT INTO ENTRIES VALUES 
 (1, '2020-01-02')
,(2, '2020-01-03')
,(3, '2020-01-04')
,(4, '2020-01-08')
;

And the current solution:

SELECT
 e.*
,r.RATE
FROM
ENTRIES e
LEFT JOIN EXCHANGE_RATES r
 ON r.VALID_FROM = (select max(VALID_FROM) from EXCHANGE_RATES where VALID_FROM <= e.DATE)
;

Results are fine just looking for a better way if any:

ID  DATE        RATE
1   2020-01-02  101
2   2020-01-03  103
3   2020-01-04  104
4   2020-01-08  105

Tag your question with the database you are using. Also, provide sample data and desired results. — Gordon Linoff, Sep 18 '20 at 16:31
. . I removed the inconsistent database tags. Tag only with the database you are talking about. Or be very explicit on how you are using multiple databases. — Gordon Linoff, Sep 18 '20 at 17:10

score 2 · Answer 1 · answered Sep 18 '20 at 17:00

2

An alternative approch uses a lateral join:

select e.*, er.*
from entries e
outer apply (
    select rate, valid_from
    from exchange_rates er
    where er.valid_from <= e.date
    order by er.valid_from desc
    offset 0 rows fetch next 1 row only
) er

This syntax should work in Oracle and SQL Serer, which you both tagged. An index on exchange_rate(valid_from, rate) might help speed up the query.

Demo on DB Fiddlde:

ID | DATE       | rate | valid_from
-: | :--------- | ---: | :---------
 1 | 2020-01-02 |  101 | 2020-01-01
 2 | 2020-01-03 |  103 | 2020-01-03
 3 | 2020-01-04 |  104 | 2020-01-04
 4 | 2020-01-08 |  105 | 2020-01-05

answered Sep 18 '20 at 17:00

GMB

216,147
25
84
135

nice one, thanks, though doesn't seem to perform much better on my live data/system for some reason actually.. – Tom Sep 19 '20 at 17:00
@Tom: do you have the recommended index in place? – GMB Sep 19 '20 at 17:15
yep, got a clustered index in place on those cols – Tom Sep 20 '20 at 09:53

score 1 · Answer 2 · answered Sep 19 '20 at 07:58

I use this approach a lot on Teradata where non-equi-joins are not very efficient.

First UNION both tables adding NULLs:

select id
  ,date
  ,NULL as rate
  ,NULL as valid_from
from ENTRIES
union all
select NULL as id
  ,valid_from
  ,rate
  ,valid_from
from EXCHANGE_RATES
order by 2,1

 id   date        rate  valid_from 
      2020-01-01  101   2020-01-01 
 1    2020-01-02                   
      2020-01-03  103   2020-01-03 
 2    2020-01-03                   
      2020-01-04  104   2020-01-04 
 3    2020-01-04                   
      2020-01-05  105   2020-01-05 
 4    2020-01-08                   
      2020-01-09  109   2020-01-09

Then fill the missing data using LAG:

with union_rows as 
 (
    select id
      ,date
      ,NULL as rate
      ,NULL as valid_from
    from ENTRIES
    union all
    select NULL as id
      ,valid_from
      ,rate
      ,valid_from
    from EXCHANGE_RATES
 )
,fill_data as
 (
   select id, date 
     ,lag(rate,1,rate) 
      over (order by date,id) as rate
     ,max(valid_from) 
      over (order by date,id
            rows unbounded preceding) as valid_from
   from union_rows
 )
select * 
from fill_data

 id   date       rate   valid_from  
      2020-01-01  101   2020-01-01 
 1    2020-01-02  101   2020-01-01 
 2    2020-01-03        2020-01-01 
      2020-01-03        2020-01-03 
 3    2020-01-04  103   2020-01-03 
      2020-01-04        2020-01-04 
      2020-01-05  104   2020-01-05 
 4    2020-01-08  105   2020-01-05 
      2020-01-09        2020-01-09

and finally filter the rows from ENTRIES

 where id is not null

This works well as long as there's only a single row per date/table, i.e. only a single NULL between rows. If there're multiple NULL rows you need LAG's IGNORE NULLS option, which is not supported by SQL Server.

See fiddle

Right, have more nulls/gaps in my live data so I guess would need IGNORE NULLS after all.. I'll try to move my data to Oracle and try this solution out, looks really promosing, thank you! — Tom, Sep 19 '20 at 16:53
well, tried this approach out with option IGNORE NULLS on my live data, but doesn't seem to perform better in my case unfortunately.. don't know why but it's a cool idea anyway :) — Tom, Sep 22 '20 at 07:45

score 0 · Answer 3 · answered Sep 18 '20 at 16:34

0

The question that you are referencing was asked eight years ago.

In any case lag() and lead() work when the data is in a single table. That question is about asking for the nearest value in another table. The correlated subquery is a reasonable approach.

There are other methods as well -- but those depend on the particular problem.

answered Sep 18 '20 at 16:34

Gordon Linoff

1,242,037
58
646
786

Right, I was thinking to generate VALID_TO values in the other table first for example so can rely something more than the VALID_FROM dates alone if it makes sense.. – Tom Sep 18 '20 at 16:58

SQL lookup value per closest earlier date in other table

3 Answers3