0

I managed to join two tables together on BigQuery, however, I don't think it matches correctly. clients have descriptive client names with 75 rows, while stats contain client stats with 37342 rows. I'm trying to join these two tables together using their ExternalCustomerId with this query:

SELECT 
clients.AccountDescriptiveName AS client_name,
stats.ExternalCustomerId AS client_id,
AverageCost,
AverageCpc,
AverageCpm,
AveragePosition
FROM `298114322003.google_ads1.p_Customer_2670156874` AS clients
JOIN `298114322003.google_ads1.p_AccountStats_2670156874` AS stats
ON clients.ExternalCustomerId = stats.ExternalCustomerId

However, the results came out to 113026 rows. I expect the results to be 37342 since that's the results of the stats table. I've used RIGHT/LEFT JOIN function and it still came out the same. Any suggestions on how I should tackle this problem? Thank you!

AK

  • 2
    Well the query you posted above is already using an inner join, which is _more_ restrictive than an outer join. This means that switching to a left or right join could only result in _more_ records than 37,342 in the result set. – Tim Biegeleisen Nov 15 '21 at 16:04
  • 2
    Your sql has no issues so this is almost definitely an issue with your assumptions about your data. I suspect that the `ExternalCustomerID` is not unique in either table and so you are getting a sort of cartesian product. Post some sample data and your desired results and we can probably help out more. – JNevill Nov 15 '21 at 16:11

1 Answers1

4

It looks like your stats table has mutiple rows for the same ExternalCustomerId (and it can be understandable for example if it is partitioned and you have different data during the days).

Try to explore a little bit more as it follows:

SELECT count(*) as total, count(distinct ExternalCustomerId) as uniques
FROM `298114322003.google_ads1.p_AccountStats_2670156874`

If you have duplicated ExternalCustomerId then every client row will be multiplied by the number of matching rows in the stats table.

Gumaz
  • 239
  • 1
  • 4
  • 1
    Thanks for the info! This was def helpful, cause after discussing with our SEM manager, we have a few inactive clients but they still have a Google Ads account just in case they want to reactivate it back – hence the inactive IDs are still recorded, and no data is shown. Will look more into this. Again, thank you. – smoothiegirl97 Nov 15 '21 at 20:20