3

I am working with the GoogleAnalytics data in the BigQuery. I want to output 2 columns: specific event actions (hits) and custom dimension (session based). All that, using Standard SQL. I cannot figure out how to do it correctly. Documentation does not help either. Please help me. This is what I am trying:

SELECT
  (SELECT MAX(IF(index=80, value, NULL)) FROM UNNEST(customDimensions)) AS is_app,
  (SELECT hits.eventInfo.eventAction) AS ea
FROM  
  `table-big-query.105229861.ga_sessions_201711*`, UNNEST(hits) hits
WHERE 
  totals.visits = 1
  AND _TABLE_SUFFIX BETWEEN '21' and '21'
  AND EXISTS(SELECT 1 FROM UNNEST(hits) hits
    WHERE hits.eventInfo.eventCategory = 'SomeEventCategory'
  )
igsm
  • 1,280
  • 2
  • 17
  • 19
  • 1
    Are you getting some error? Or is the output not expected? I think maybe you should add a `where eventinfo.eventAction = 'SomeEventCategory'` in your second select) but other than that the query seems correct. – Willian Fuks Jan 05 '18 at 17:55
  • Yes, it returns `query: Column name customDimensions is ambiguous at [2:54] `. But, @sco explanation helped. – igsm Jan 08 '18 at 08:49

1 Answers1

9

Try to give your tables and sub-tables names that are not part of the original table schema. Always tell to which table you're referring - when cross joining, you're basically adding new columns (here h.* - flattened) - but the old ones (hits.* - nested) still exist. I named ga_sessions_* t and use it to refer the cross-join and also the customDimension.

Also: You don't need the legacy sql trick using MAX() for customDimensions anymore. It's a simple sub-query now :)

try:

SELECT
  (SELECT  value FROM t.customDimensions where index=80) AS is_app, -- use h.customDimensions if it is hit-scope
  eventInfo.eventAction AS ea
FROM  
  `projectid.dataset.ga_sessions_201711*` t, t.hits h
WHERE 
  totals.visits = 1
  AND _TABLE_SUFFIX BETWEEN '21' and '21'
  AND h.eventInfo.eventCategory is not null
Martin Weitzmann
  • 4,430
  • 10
  • 19
  • Actually, I have got one more question. As I understand this is more an implicit way of CROSS JOINING. In that case, what is the key by which two tables are joined? I assume that currently the joining happens in the scope of each unique session? – igsm Jan 08 '18 at 12:03
  • 1
    Cross joining doesn't need a key - it's all possible row combinations from both tables. Meaning both tables mustn't be NULL. And because every session is one row in `t`, yes, it's session scope. – Martin Weitzmann Jan 08 '18 at 12:29
  • shouldn't the second line read `(SELECT value FROM h.customDimensions where index=80) AS is_app` ? – Grisha Apr 11 '18 at 12:35
  • Anyone getting this error: ```Scalar subquery produced more than one element``` – Jas Jul 11 '19 at 00:02
  • That means your subquery returned more than one value. But a field can only hold one value - so either you feed the output into an `array()` or you `limit 1` within the subquery. If you don't expect more than one value, there's probably something wrong with the data creation process. – Martin Weitzmann Jul 11 '19 at 06:58