I have two tables in BigQuery - one which contains ppc advertising data, another containing enquiries. I'd like to join the two so I can report ppc revenue vs spend per day.
This initially felt pretty simple, but I've tried both a simple left join and subqueries and, having hit some snags with both, I am focusing on the left join.
I have:
#standardSQL
SELECT
CAST(ppc.Date AS DATE) AS Date,
COUNT(1) AS `Rows`,
COUNT(DISTINCT(ppc.ID)) AS `PPCRows`,
COUNT(DISTINCT(EnquiryId)) AS `EnquiryRows`
FROM
`db.ppc_data.adgroup_performance_summary_report` ppc
LEFT JOIN
`db.enquiries.output_final_scheduled` led
ON CAST(ppc.Date AS DATE) = CAST(led.EnquiryDateTime AS DATE)
WHERE
SUBSTR(CAST(led.EnquiryDateTime AS STRING), 1, 7) = "2018-01"
GROUP BY 1
Despite being defined as a left join, the data being returned indicates (I think) that this is doing a cross join - the value of the Rows
column is the product of PPC Rows
and Enquiry Rows
:
I don't really want to have to factor COUNT(DISTINCT(whatever))
into all the aggregate columns that I need to add next!
Also, it's taking an age to run - is there a more efficient way of writing this query?