2

Here are the exemple tables:

Product Prices (prd)

start_date product price
2023-04-01 prod_A 10.0
2023-04-15 prod_A 20.0
2023-04-01 prod_B 20.0

Order Products (ord)

order_date id product
2023-04-01 10001 prod_A
2023-04-01 10001 prod_B
2023-04-02 10002 prod_A
2023-04-02 10002 prod_B
2023-04-16 10003 prod_A
2023-04-16 10003 prod_B

Desired Result

order_date id product price
2023-04-01 10001 prod_A 10.0
2023-04-01 10001 prod_B 20.0
2023-04-02 10002 prod_A 10.0
2023-04-02 10002 prod_B 20.0
2023-04-16 10003 prod_A 20.0
2023-04-16 10003 prod_B 20.0

My first attempt was the following approach:

SELECT ord.order_date, ord.id, ord.product, prd.price
FROM tra
LEFT JOIN (
    SELECT *
    FROM prd
    ORDER BY prd.start_date ASC
) AS prd ON ord.id = prd.id AND ord.order_date >= prd.start_date

But some records keep getting duplicated, like:

Undesired Result

order_date id product price
2023-04-16 10003 prod_A 10.0
2023-04-16 10003 prod_B 20.0

I know why they are duplicated but don't know what to do. Any idea how I could fix the query?

GMB
  • 216,147
  • 25
  • 84
  • 135
  • 2
    Welcome to Stack Overflow! A well-asked question this is, with examples and code. I hope to see more contributions from you. – O. Jones Apr 27 '23 at 21:47
  • 2
    Tag your question properly. Which db are you using? Tagging every SQL vendor will not help you at all. – Eric Apr 27 '23 at 21:50
  • Please either ask re 1 bad query/function with obligatory [mre], including why you think it should return something else or are unsure at the 1st subexpression that it doesn't give what you expect or are stuck, justified by reference to authoritative documentation, or ask about your overall goal giving working parts you can do with justification & a [mre]. Then misunderstood code doesn't belong. But please ask about unexpected behaviour 1st because misconceptions get in the way of your goal. [ask] [Help] Basic questions are faqs. – philipxy Apr 27 '23 at 23:22
  • Nested ORDER BY without TOP/LIMIT has no effect, tables have no row order, result sets have row order per an outer ORDER BY. – philipxy Apr 27 '23 at 23:23

4 Answers4

2

The intent of the query is indeed a lateral join, as Joel Coehoorn points out.

Any idea how I could fix the query?

The technique is to move the predicates within the subquery.

In Postgres:

SELECT o.*, p.price, p.start_date
FROM ord o
LEFT JOIN LATERAL (
    SELECT p.*
    FROM prd p
    WHERE p.id = o.id and p.start_date <= o.order_date
    ORDER BY start_date DESC LIMIT 1
) p ON TRUE

One of the benefits of the lateral join is that it can return multiple rows and columns. But if all you want is the price, a scalar subquery is good enough as well.

GMB
  • 216,147
  • 25
  • 84
  • 135
1

This is called a lateral join, and the syntax is different (or may not be supported at all) depending on what kind of database you have. SQL Server uses APPLY, for example. Additionally, very often you can re-write the query using a windowing function for better performance.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
0

This is an other approach using ABS function to get the closest start date from the product table and the window function row_number() to sort the dataset and pick only the first match (the smallest Date_Diff )

with cte as (
  select o.*, p.price, ABS(order_date - start_date) as date_diff
  from ord o
  left join prd p on p.product = o.product
),
cte2 as (
  select *, row_number() over (partition by order_date, id, product order by date_diff asc) as rn
  from cte
)
select order_date,  id, product,    price
from cte2
where rn = 1;

Result :

order_date  id      product price
2023-04-01  10001   prod_A  10
2023-04-01  10001   prod_B  20
2023-04-02  10002   prod_A  10
2023-04-02  10002   prod_B  20
2023-04-16  10003   prod_A  20
2023-04-16  10003   prod_B  20

Demo here

SelVazi
  • 10,028
  • 2
  • 13
  • 29
0

I used postgresql and used Lead function to fix this issue. Try and see if this works for you.

SELECT o.order_date, o.id, o.product, pp.price AS price
FROM orders o
LEFT JOIN (
    SELECT product, price, 
           start_date, 
           LEAD(start_date) OVER (PARTITION BY product ORDER BY start_date) AS end_date 
    FROM product_prices
) pp ON o.product = pp.product AND o.order_date >= pp.start_date AND (o.order_date < pp.end_date OR pp.end_date IS NULL)
ORDER BY o.order_date, o.id, o.product;

I used an additional column "end_date" that represents the start date of the next price record for the same product. This is obtained by using a window function (LEAD) to get the next start date for each product.