1

I want to produce a sales report showing the price of each product at the time it was sold. The pricing is stored in a separate price history table, so I want to join with the most recent price that is prior to the sale date.

I have an idea of how to implement the query, but the database I'm working with (Vertica) doesn't seem to support what I want to do.

Here's a simplified version of the table structure:

Sales
-----
Date      Product ID
1/1/2001  1
2/2/2002  1
3/3/2003  1

PriceHistory
-------------
Date        ProductID     Price
12/31/2000  1             1.00
12/31/2001  1             1.01
12/31/2002  1             1.11

Here's an example of the report I want to generate from the above data:

Sales Report
------------
Date       ProductID     Price
1/1/2001   1             1.00
2/2/2002   1             1.01
3/3/2003   1             1.11

Here's the SQL I've written so far:

SELECT s.date,
       s.productid,
       ph.price
FROM Sales s,
     PriceHistory ph
WHERE s.productid=ph.productid
  AND ph.date=
    (SELECT MAX(date)
     FROM PriceHistory
     WHERE productid=s.productid
       AND date < s.date)

That might work on another DB platform, but Vertica gives me this error: "Non-equality correlated subquery expression is not supported". It doesn't like the date < s.date component of my subquery.

Is there another way to do this?

Later, I found a workaround that did work on Vertica, using two different references to the PriceHistory table in the query:

SELECT s.date,
       s.productid,
       ph.price
FROM Sales s
JOIN PriceHistory ph ON ph.date < s.date
WHERE s.productid=ph.productid
  AND ph.date=
    (SELECT MAX(date)
     FROM PriceHistory ph2
     WHERE ph2.productid=s.productid
       AND ph2.date = ph.date)

There may be a better way to do it, if so I'd love to hear it.

woot
  • 7,406
  • 2
  • 36
  • 55
Matt
  • 1,284
  • 14
  • 22

2 Answers2

2

Try the JOIN INTERPOLATE clause. For your example the following example produce exactly what yow want:

SELECT s.date,
   s.productid,
   ph.price
FROM Sales s
left outer join PriceHistory ph  ON  s.date INTERPOLATE
    PREVIOUS VALUE ph.date
where s.productid=ph.productid
woot
  • 7,406
  • 2
  • 36
  • 55
LucaCas
  • 21
  • 2
1

Issue is with this:

  SELECT MAX(date)
  FROM PriceHistory
  WHERE productid=s.productid
   AND date < s.date

S isn't defined...subqueries are stand alone and can't refer to tables in the from statement in the 'outside' query. Just do the join in your subquery.

SELECT MAX(date)
 FROM PriceHistory ph, Sales s
 WHERE ph.productid=s.productid
   AND date < s.date

I feel jealous knowing other people get to play on Vertica while I'm in MySQL. ugh..

Should mention you are using old syntax and the majority of this can be done as joins. Call the max_date a subquery and move it to the from statement...inner joins function as filters at that level.

Edit:

If that failed, it's vertica having some subquery problems...it's been a while but I think I've solved this one in the past (with vertica support):

SELECT s.date,
   s.productid,
   ph.price
FROM Sales s,
 PriceHistory ph,
(SELECT MAX(date)
  FROM PriceHistory ph, Sales s
  WHERE ph.productid=s.productid
AND date < s.date) a
WHERE s.productid=ph.productid
AND ph.date= max_date

Give that version a try...if that fails, I'd go back to vertica support and ask them for how to handle it (functionally, what you are asking for should be standard, I assume vertica just has a slightly altered way of getting there...I recall being asked to avoid using subqueries in where clauses).

Twelfth
  • 7,070
  • 3
  • 26
  • 34
  • Actually using s.productid inside the subquery doesn't pose a problem, it was using the non-equality operator in "date < s.date" that broke it. – Matt Oct 23 '13 at 18:42
  • I figured it'd be unable to figure out what the s.date meant vs date since s.date does not exist in the subquery. This might be a VSQL limitation?...solution there would be to change this over to an inner join statement. Let me edit an answer – Twelfth Oct 23 '13 at 18:56