0

Background story: We use Google Analytics to track user behaviour on our website. The data is exported daily into Big Query. Our implementation is quite complex and we use a lot of custom dimensions.

Requirements: 1. The data needs to be imported into our internal databases to enable better and more strategic insights. 2. The process needs to run without requiring human interaction

The problem: Google Analytics data needs to be in a flat format so that we can import it into our database.

Question: How can I unnest custom dimensions data using Google Data Prep?

What it looks like?

----------------
customDimensions
----------------
[{"index":10,"value":"56483799"},{"index":16,"value":"·|·"},{"index":17,"value":"N/A"}]

What I need it to look like?

----------------------------------------------------------
customDimension10 | customDimension16 | customDimension17
----------------------------------------------------------
56483799          | ·|·                | N/A

I know how to achieve this using a standard SQL query in Big Query interface but I really want to have a Google Data Prep flow that does it automatically.

  • By automatic do you mean a process that runs on some interval? Maybe you can use scheduled queries: https://cloud.google.com/bigquery/docs/scheduling-queries – Elliott Brossard Nov 07 '18 at 14:50
  • @ElliottBrossard Yes - I need it to run daily at a set time. A scheduled query seems like a possible solution so I will give it a try. Thank you for pointing it out. I was hoping to use Dataprep so that it is more reliable (scheduled queries are a feature in beta and could change or disappear) and easier to work with - nice interface and no need to write SQL and workaround for selecting the most recent table and setting a well-named destination table. – Yordan Vasilev Nov 08 '18 at 15:45

1 Answers1

0

Define the flat format and create it in BigQuery first. You could

  • create one big table and repeat several values using CROSS JOINs on all the arrays in the table
  • create multiple tables (per array) and use ids to connect them, e.g.
    • for session custom dimensions concatenate fullvisitorid / visitstarttime
    • for hits concatenate fullvisitorid / visitstarttime / hitnumber
    • for products concatenate fullvisitorid / visitstarttime / hitnumber / productSku

The second options is a bit more effort but you save storage because you're not repeating all the information for everything.

Martin Weitzmann
  • 4,430
  • 10
  • 19
  • I would rather go with the first option and create one big table as this is the preferred way to import it into our database. The question still stands though - can that be achieved by using Google Cloud Dataprep? Thanks – Yordan Vasilev Nov 08 '18 at 08:34
  • I don't know - I would do it in BigQuery beforehand as it is usually much faster. – Martin Weitzmann Nov 08 '18 at 09:36