-1

I have two tables in BigQuery - one which contains ppc advertising data, another containing enquiries. I'd like to join the two so I can report ppc revenue vs spend per day.

This initially felt pretty simple, but I've tried both a simple left join and subqueries and, having hit some snags with both, I am focusing on the left join.

I have:

#standardSQL
SELECT 
  CAST(ppc.Date AS DATE) AS Date,
  COUNT(1) AS `Rows`,
  COUNT(DISTINCT(ppc.ID)) AS `PPCRows`,
  COUNT(DISTINCT(EnquiryId)) AS `EnquiryRows`
FROM
  `db.ppc_data.adgroup_performance_summary_report` ppc
LEFT JOIN
  `db.enquiries.output_final_scheduled` led
ON CAST(ppc.Date AS DATE) = CAST(led.EnquiryDateTime AS DATE)
WHERE
  SUBSTR(CAST(led.EnquiryDateTime AS STRING), 1, 7) = "2018-01"
GROUP BY 1

Despite being defined as a left join, the data being returned indicates (I think) that this is doing a cross join - the value of the Rows column is the product of PPC Rows and Enquiry Rows:

enter image description here

I don't really want to have to factor COUNT(DISTINCT(whatever)) into all the aggregate columns that I need to add next!

Also, it's taking an age to run - is there a more efficient way of writing this query?

Adam Hopkinson
  • 28,281
  • 7
  • 65
  • 99
  • 2
    Your query is actually doing an `inner join`, but that is not your question. You are clearly missing a `join` condition of some sort, but without sample data or table layouts it is hard to tell what the issue is. – Gordon Linoff Apr 06 '18 at 12:44
  • just checking - above result correspond to statement with LEFT JOIN or with INNER JOIN? confused because I see LEFT JOIN but @GordonLinoff mentioned INNER JOIN – Mikhail Berlyant Apr 06 '18 at 13:08
  • Please [use text, not images/links, for text (including code, tables & ERDs)](https://meta.stackoverflow.com/q/285551/3404097). Use an image only for convenience to supplement text and/or for what cannot be given in text. – philipxy Apr 06 '18 at 17:17
  • 2
    Hi. Find out what left join on returns: inner join on rows plus unmatched left table rows extended by nulls. Since your where is only true for non-null right table columns, it removes any null-extended rows, so it is "turning a left join into an inner join". Also you are aggregating the join, but you probably want the join of two aggregations of joins. Find out what cross join is--it's inner join on true. Please read & act on [mcve]. Examine small input & intermediate calculations to see what's going on. You don't even say what output you want per input, how are we to suggest a solution? – philipxy Apr 06 '18 at 20:27
  • 1
    thank you @philipxy - got it - `where clause effectively transformed left to inner` - I was not sure - I thought OP might corrected his question right after Gordon's comment as it was done looks like within minute or so – Mikhail Berlyant Apr 06 '18 at 20:33

2 Answers2

1

This is definitely not a CROSS JOIN
It would be - if COUNT(1) were a product of COUNT(ppc.ID) and COUNT(EnquiryId).

Meantime, if you are not getting result you expect - please post specific question describing your use case

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
0

You probably want to aggregate before joining:

SELECT ppd.dte AS Date, ppc.rows as PPCRows,
       led.cnt as `EnquiryRows`
FROM (SELECT CAST(ppc.Date AS DATE) as dte, COUNT(*) as rows
      FROM `db.ppc_data.adgroup_performance_summary_report`
      GROUP BY CAST(ppc.Date AS DATE)
     ) ppc LEFT JOIN
     (SELECT CAST(led.EnquiryDateTime AS DATE) as dte, COUNT(*) as rows
      FROM `db.enquiries.output_final_scheduled` led
      GROUP BY CAST(led.EnquiryDateTime AS DATE)
     ) led
     ON ppc.dte = led.dte
WHERE led.EnquiryDateTime >= '2018-01-01' AND
      led.EnquiryDateTime < '2018-02-01'
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786