I have a BigQuery table as below:
date hits_eventInfo_Category hits_eventInfo_Action session_id user_id hits_time hits_eventInfo_Label
20151021 Air Search 1445001 A232 1952 City1
20151021 Air Select 1445001 A232 2300 Vendor1
20151021 Air Search 1445001 A111 1000 City2
20151021 Air Search 1445001 A111 1900 City3
20151021 Air Select 1445001 A111 7380 Vendor2
20151021 Air Search 1445001 A580 1000 City4
20151021 Air Search 1445001 A580 1900 City5
20151021 Air Search 1445001 A580 1900 City6
20151021 Air Select 1445001 A580 7380 Vendor3
The table shows user activity for 3 users - A232, A111 and A580 such that:
i) A232 - Made 1 Search at 'City1' and chose 'Vendor1' from 'City1'
ii) A111 - Made the 1st search at 'City2' and did not choose any vendor from there. Made a 2nd search at 'City3' and then ultimately chose a 'Vendor2' from here.
iii) A580 - 1st search at 'City4', no vendor chosen. 2nd search at 'City5', no vendor chosen. 3rd search at 'City6', 'Vendor3' chosen from City6.
I am interested in only retrieving the city from which the user actually chose a vendor, that is, not interested in previous searches made by the user that did not result into choosing a vendor.
Required output table:
date hits_eventInfo_Category hits_eventInfo_Action session_id user_id hits_time city vendor
20151021 Air Search 1445001 A232 1952 City1 Vendor1
20151021 Air Search 1445001 A111 1900 City3 Vendor2
20151021 Air Search 1445001 A580 1900 City6 Vendor3
I have been trying to do this using the LAG function over the hits_eventInfo_eventLabel field after partitioning on user_id and ordering by hits_time i.e LAG(hits_eventInfo_eventLabel,1) OVER( PARTITION BY user_id ORDER BY hits_time)
However, since I am using my lag offset as 1, the above expression helps me get the desired output for user A232 only (as he made only 1 search which means the previous record from before selecting the vendor is for sure a search record).
Is there a way I can make this lag expression more dynamic such that it retrieves only the immediate location searched upon before making a selection - irrespective of how many searches were made before a selection was made?
OR
Is there an alternative function/route that I can take to achieve this?