-1

I am trying to create a view with the following fields: Date, values for custom dimension X, values for custom dimension Y, visitor ID, pageviews, values for content group A, values for content group B, event category, event action.

I am aware this is a lot of fields and I'm having a really hard time coming up with a solution.

Right now, I have created one view with Date, values for custom dimension X, values for custom dimension Y, visitor ID, pageviews.

I then created a second view, with Date, visitor ID, values for content group A, values for content group B, event category, event action.

I joined the two on Date and Visitor ID.

-- code below --

CREATE VIEW `TEST_1`
AS

SELECT
date AS Date,
(SELECT value FROM UNNEST(customDimensions) WHERE index=2) AS Institution_ID,
(SELECT value FROM UNNEST(customDimensions) WHERE index=3) AS Institution_Name,
fullVisitorId AS Visitor_ID,
SUM(totals.pageviews) AS Pageviews
FROM
`ga_sessions_*`
GROUP BY
date,
Visitor_ID,
Institution_ID,
Institution_Name

CREATE VIEW `TEST_2`
AS

select 
DATE as DATE,
fullVisitorId AS Visitor_ID,
hits.contentGroup.contentGroup3 as Topic_Name,
hits.contentGroup.contentGroup5 as Topic_ID,
hits.eventInfo.eventCategory as Event_Category,
hits.eventInfo.eventAction as Event_Action
FROM
`ga_sessions_*`
, UNNEST (HITS) AS HITS


CREATE VIEW `TEST_3` AS
SELECT A.DATE, 
A.VISITOR_ID, 
Institution_ID, 
Institution_Name,  
Pageviews, 
Topic_Name, 
Topic_ID, 
Event_Category, 
Event_Action  
FROM `TEST_1` A 
JOIN `TEST_2` B ON A.DATE=B.DATE AND A.VISITOR_ID=B.VISITOR_ID

The queries run with no error, but the number of pageviews is totally wrong and much much higher than what is reported in GA. I suspect I'm duplicating rows inefficiently here.

Can someone help me out understand what I am doing wrong? Thank you so much!

Jurij
  • 33
  • 5
  • Please provide sample data, desired results, and an explanation of the logic you want to implement. A non-working query does not have this information. – Gordon Linoff Mar 09 '20 at 17:56

1 Answers1

0

This article explains the format and schema of the Google Analytics data that is imported into BigQuery and here it is a graphical representation .

This BigQuery cookbook contains examples of how to construct queries of the Analytics data you export to BigQuery.

Please, take a look at this in order to get a deeper understanding of the UNNEST() function which works as a "CROSS JOIN" in "1 to many relationship".

Please, bear in mind I don't know your data. Having said that, it seems that in your case, when you create view 3 which is a join between view 1 and view 2, a CROSS JOIN might be occurring in a "Many to Many relationship". Perhaps you should try the following:

SELECT
date AS Date,
(SELECT value FROM UNNEST(hits.customDimensions) WHERE index=2) AS Institution_ID,
(SELECT value FROM UNNEST(hits.customDimensions) WHERE index=3) AS Institution_Name,
fullVisitorId AS Visitor_ID,
SUM(totals.pageviews) AS Pageviews,  
hits.contentGroup.contentGroup3 as Topic_Name,
hits.contentGroup.contentGroup5 as Topic_ID,
hits.eventInfo.eventCategory as Event_Category,
hits.eventInfo.eventAction as Event_Action
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST (hits) AS hits
GROUP BY
date,
Visitor_ID,
Institution_ID,
Institution_Name,
hits.contentGroup.contentGroup3,
hits.contentGroup.contentGroup5,
hits.eventInfo.eventCategory,
hits.eventInfo.eventActio
Rafael L
  • 11
  • 1
  • I modified this query with `COUNT(*) as Total_Events` in the SELECT clause to see how many events occurred for each type of event category. What's confusing is that even though some rows of Event_Category had null values, there was a value in the Total_Events greater than 0. Do you know how that's possible? – techscolasticus Oct 23 '20 at 19:45