-1

I have a listing of all consumer purchases where some consumers make many purchases over the time frame in scope. I'd like to populate a column with the location of each consumer's first purchase but I'm getting this error:

Error in SQL statement: ParseException: 
mismatched input '(' expecting <EOF>(line 2, pos 25)

== SQL ==
SELECT consumer_id
       ,location OVER(partition BY table.consumer_id) AS first_purchase_site
---------------------^^^
FROM table

For clarity, here is my query:

SELECT consumer_id
       ,location OVER(partition BY table.consumer_id) AS first_purchase_site
FROM table
WHERE consumer_purchase_order_sequence = 1
June Smith
  • 163
  • 1
  • 11
  • You should use window functions https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-window.html – MertG Mar 04 '21 at 19:57
  • partition by works with aggregate functions. you can not use `location over()`. Pls use this `SELECT consumer_id ,location from ( SELECT a.*, row_number() OVER(partition BY table.consumer_id) AS rn from Table a) rs WHERE rs.rn=1 ` – Koushik Roy Mar 04 '21 at 20:03

3 Answers3

1

I'd like to populate a column with the location of each consumer's first purchase

Are you looking for first_value()?

SELECT consumer_id,
       FIRST_VALUE(location) OVER (partition BY table.consumer_id) AS first_purchase_site
FROM table;

Your window function is, errr, missing the function.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

You need window function FIRST_VALUE():

SELECT DISTINCT consumer_id,
       FIRST_VALUE(location) OVER(PARTITION BY consumer_id ORDER BY consumer_purchase_order_sequence) AS first_purchase_site
FROM table

Change consumer_purchase_order_sequence with the column that orders the purchases.

forpas
  • 160,666
  • 10
  • 38
  • 76
0

Its hard to do it with window calculations .You can do it with joins,

SELECT 
  table.consumer_id,
  table.location,
  a.first_purchase_site
FROM table LEFT JOIN
  (SELECT consumer_id,location AS first_purchase_site FROM table WHERE 
  consumer_purchase_order_sequence = 1) a ON a.consumer_id=table.consumer_id