What I'm trying to do
Hello, I'm trying to improve GA's granularity to the hit level for a website using custom dimensions, like on this article. I'm using hit timestamp, session id and user id dimensions.
I already set everything up on GA and GTM. I tested on GTM preview mode and confirmed te hit timestamp variable and that the GA tag is firing correctly. Also confirmed that GA is collecting the data.
Here'a image of the GTM configuration tag:
The problem
The number of hit timestamps I'm collecting is too low when compared to GA's inbuilt metrics. For example, yesterday the website had 80k hits from which 55k were page views. Since I'm firing the tag on every page piew event, I'd expect to collect 55k timestamps from the custom dimension.
In reality, I'm only getting 9k. Consistently about only 20 percent of Page Views.
Code and Examples
Here's the sample code I'm using to extract some test reports:
def extract_report(analytics, report_request):
data = []
response = analytics.reports().batchGet(
body={
'reportRequests': report_request
}
).execute()
data.extend(response['reports'][0]['data']['rows'])
print(f"Extracted {len(data)} rows")
while response['reports'][0].get('nextPageToken'):
report_request[0]['pageToken'] = response['reports'][0]['nextPageToken']
response = analytics.reports().batchGet(
body={
'reportRequests': report_request,
}
).execute()
data.extend(response['reports'][0]['data']['rows'])
print(f"Extracted {len(data)} rows")
return data
And here's a query that extracts some metrics for a single day:
def main():
analytics = initialize_analyticsreporting()
# Create string for yesterday's date
yesterday = datetime.now() - timedelta(days=1)
yesterday_string = yesterday.strftime('%Y-%m-%d')
hit_queries = [{
'viewId': VIEW_ID,
'dateRanges': [{'startDate': 'yesterday', 'endDate': 'yesterday'}],
'dimensions': [
{'name': 'ga:date'}
],
'metrics': [
{'expression': 'ga:hits'},
{'expression': 'ga:pageviews'},
{'expression': 'ga:uniquePageviews'}
]
}]
print('Extracting hit data')
r = extract_report(analytics, hit_queries)
print(r)
if __name__ == '__main__':
main()
This gives me the following result:
Extracting hit data
Extracted 1 rows
[{'dimensions': ['20221002'], 'metrics': [{'values': ['80205', '55293', '13406']}]}]
Which means that on that given day we had 80k total hits, 55k page views, and 13k unique page views.
However when I try to get all the hit timestamps collected with custom dimensions:
def main():
analytics = initialize_analyticsreporting()
# Create string for yesterday's date
yesterday = datetime.now() - timedelta(days=1)
yesterday_string = yesterday.strftime('%Y-%m-%d')
hit_queries = [{
'viewId': VIEW_ID,
'dateRanges': [{'startDate': 'yesterday', 'endDate': 'yesterday'}],
'dimensions': [
{'name': 'ga:dimension3'}
],
'metrics': [
{'expression': 'ga:hits'}
],
'includeEmptyRows': True
}]
print('Extracting hit data')
r = extract_report(analytics, hit_queries)
print('Sample data:')
print(r[:10])
I get this result:
Extracting hit data
Extracted 1000 rows
Extracted 2000 rows
Extracted 3000 rows
Extracted 4000 rows
Extracted 5000 rows
Extracted 6000 rows
Extracted 7000 rows
Extracted 8000 rows
Extracted 9000 rows
Extracted 9150 rows
Sample data:
[{'dimensions': ['1664633132108'], 'metrics': [{'values': ['1']}]}, {'dimensions': ['1664633912892'], 'metrics': [{'values': ['1']}]}, {'dimensions': ['1664694004446'], 'metrics': [{'values': ['1']}]}, {'dimensions': ['1664694006760'], 'metrics': [{'values': ['1']}]}, {'dimensions': ['1664694039475'], 'metrics': [{'values': ['1']}]}, {'dimensions': ['1664694067927'], 'metrics': [{'values': ['1']}]}, {'dimensions': ['1664694074017'], 'metrics': [{'values': ['1']}]}, {'dimensions': ['1664694118729'], 'metrics': [{'values': ['1']}]}, {'dimensions': ['1664694163081'], 'metrics': [{'values': ['1']}]}, {'dimensions': ['1664694165036'], 'metrics': [{'values': ['1']}]}]
Which means I'm getting only 9k timestamps for the same day.
I've been stuck on this for over a week. I tried several changes with the GTM tag configuration and experimented with different queries and metrics, but I can't get a reasonable number of timestamps from the GA API.
Also, I'm not having any problems with the other dimensions. They're working fine. My only problem is with hit_timestamp.
What am I doing wrong? Did I configure anything wrong, or am I making a mistake on my queries?