11

I have a partitioned table and would love to use a MERGE statement, but for some reason doesn't work out.

MERGE `wr_live.p_email_event`  t
using `wr_live.email_event` s
on t.user_id=s.user_id and t.event=s.event and t.timestamp=s.timestamp
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
values (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)

I get

Cannot query over table 'wr_live.p_email_event' without a filter that can be used for partition elimination.

What's the proper syntax? Also is there a way I can express shorter the insert stuff? without naming all columns?

Pentium10
  • 204,586
  • 122
  • 423
  • 502

1 Answers1

15

What's the proper syntax?

As you can see from error message - your partitioned wr_live.p_email_event table was created with require partition filter set to true. This mean that any query over this table must have some filter on respective partitioned field

Assuming that timestamp IS that partitioned field - you can do something like below

MERGE `wr_live.p_email_event`  t
USING `wr_live.email_event` s
ON t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
AND DATE(t.timestamp) > CURRENT_DATE()  -- this is the filter you should tune 
WHEN NOT MATCHED THEN
INSERT (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)
VALUES (user_id,event,engagement_score,dest_email_domain,timestamp,tags,meta)   

So you need to make below line such that it in reality does not filter out whatever you need to be involved

AND DATE(t.timestamp) <> CURRENT_DATE()  -- this is the filter you should tune 

For example, I found, setting it to timestamp in future - in many cases addresses the issue, like

AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)   

Of course, if your wr_live.email_event table also partitioned with require partition filter set to true - you need to add same filter for s.timestamp

Also is there a way I can express shorter the insert stuff? without naming all columns?

BigQuery DML's INSERT requires column names to be specified - there is no way (at least that I am aware of) to avoid it using INSERT statement
Meantime, you can avoid this by using DDL's CREATE TABLE from the result of the query. This will not require listing the columns

For example, something like below

CREATE OR REPLACE TABLE `wr_live.p_email_event`
PARTITION BY DATE(timestamp) AS
SELECT * FROM `wr_live.p_email_event` 
WHERE DATE(timestamp) <> DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
UNION ALL 
SELECT * FROM `wr_live.email_event` s
WHERE NOT EXISTS (
  SELECT 1 FROM `wr_live.p_email_event` t
  WHERE t.user_id=s.user_id AND t.event=s.event AND t.timestamp=s.timestamp
  AND DATE(t.timestamp) > DATE_ADD(CURRENT_DATE(), INTERVAL 1 DAY)
)

You might also want to include table options list via OPTIONS() - but looks like filter attribute is not supported yet - so if you do have/need it - above will "erase" this attribute :o(

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230