Extract events data from GA4 via bigquery in ADF synapse delta table

Question

We need to extract the events table from GA4 through bigquery (not connecting via Google API directly as it limits both - the number of rows & number of dimensions/metrics), however as there are several nested columns, the ADF reads data in the given format:

{
    "v": [{
        "v": {
            "f": [{
                "v": "firebase_conversion"
            }, {
                "v": {
                    "f": [{
                        "v": null
                    }, {
                        "v": "0"
                    }, {
                        "v": null
                    }, {
                        "v": null
                    }]
                }
            }]
        }
    }, {
        "v": {
            "f": [{
                "v": "ga_session_id"
            }, {
                "v": {
                    "f": [{
                        "v": null
                    }, {
                        "v": "123"
                    }, {
                        "v": null
                    }, {
                        "v": null
                    }]
                }
            }]
        }
    }, {
        "v": {
            "f": [{
                "v": "engaged_session_event"
            }, {
                "v": {
                    "f": [{
                        "v": null
                    }, {
                        "v": "1"
                    }, {
                        "v": null
                    }, {
                        "v": null
                    }]
                }
            }]
        }
    }, {
        "v": {
            "f": [{
                "v": "ga_session_number"
            }, {
                "v": {
                    "f": [{
                        "v": null
                    }, {
                        "v": "9"
                    }, {
                        "v": null
                    }, {
                        "v": null
                    }]
                }
            }]
        }
    }, {
        "v": {
            "f": [{
                "v": "page_referrer"
            }, {
                "v": {
                    "f": [{
                        "v": "ABC"
                    }, {
                        "v": null
                    }, {
                        "v": null
                    }, {
                        "v": null
                    }]
                }
            }]
        }
    }, {
        "v": {
            "f": [{
                "v": "page_title"
            }, {
                "v": {
                    "f": [{
                        "v": "ABC"
                    }, {
                        "v": null
                    }, {
                        "v": null
                    }, {
                        "v": null
                    }]
                }
            }]
        }
    }, {
        "v": {
            "f": [{
                "v": "page_location"
            }, {
                "v": {
                    "f": [{
                        "v": "xyz"
                    }, {
                        "v": null
                    }, {
                        "v": null
                    }, {
                        "v": null
                    }]
                }
            }]
        }
    }, {
        "v": {
            "f": [{
                "v": "session_engaged"
            }, {
                "v": {
                    "f": [{
                        "v": null
                    }, {
                        "v": "1"
                    }, {
                        "v": null
                    }, {
                        "v": null
                    }]
                }
            }]
        }
    }]
}

Unnesting is a problem as there are several columns with such data structure, and unnest will increase the number of rows (3.5mn records becomes 40mn). The plan is to maybe extract data as is & use azure functions with python functions to flatten it as JSON, but again the null values are creating trouble there.

Someone can suggest the best way to get data on daily basis without extrapolation in the desired format in the data lake?

@Aswin yes I did, however it still shows in the same format [{V:....}] as mentioned above. Bigquery console itself has the way to automatically flatten the data but the moment it comes in ADF, despite of sink file type - the string comes this way unless it is unnested in query itself. — Alefiya S. Tikiwala, Jan 25 '23 at 09:28

Extract events data from GA4 via bigquery in ADF synapse delta table

0 Answers0