0

I am downloading some data from Google Analytics using Google API v4. I am getting data, and I am trying to use the pageToken parameter to request the next page when pageSize is exceeded. However, my pagination function, which should pass the new pageToken into a new request, enters a loop where it performs endlessly the same, first request (given that this line: print(response['reports'][0]['nextPageToken']) prints always the max value of pagesize, which is the value nextPageToken takes with the very first request).

The query should produce ~8000 results/rows.

What I tried was to create a variable for the pageToken parameter in the request and making this variable to take the nextPageToken value in the new request made by the recursive function:

pageTokenVariable = "whatever"

sample_request = {
  'viewId': '1234',
  'dateRanges': {
      'startDate': datetime.strftime(datetime.now() - timedelta(days = 1),'%Y-%m-%d'),
      'endDate': datetime.strftime(datetime.now(),'%Y-%m-%d')
  },
  'dimensions': [
      {'name': 'ga:date'},
      {'name': 'ga:eventlabel'}
  ],
  'metrics': [
      {'expression': 'ga:users'},
      {'expression': 'ga:totalevents'}
  ],
  'pageToken':pageTokenVariable,
    'pageSize': 1000
}

# pagination function
def main(client, pageTokenVariable):

    response = client.reports().batchGet(
    body={
        'reportRequests':sample_request
    }).execute()

    if 'nextPageToken' in response['reports'][0]:
            print(response['reports'][0]['nextPageToken']) #trying to debug
            pageTokenVariable = response['reports'][0]['nextPageToken']
            response = main(client, pageTokenVariable)

    return(response)

Nonetheless, it does not work as intended. What am I missing?

E. Faslo
  • 325
  • 5
  • 19

3 Answers3

3

Here' is final code after 12 hours of work , this will work for more than 100K rows and historical data

"""Author             :AMARNADH G(INDIA)
   Date last modified :2020-12-12
   Description        :Pulls Google Anlytics data with pagination and unsampled data
   Comments           :Dimentions, Metrics and DateRanges are dynamic in nature in which daterange is parameterised""" 

###GOOGLE ANALYICS V4

from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
from datetime import datetime, timedelta
import io

todayStr = datetime.today().strftime('%Y-%m-%d')
YstrdyInt = datetime.today() - timedelta(days=1)
YstrdyStr = datetime.strftime(YstrdyInt, '%Y-%m-%d')

SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
KEY_FILE_LOCATION = 'C:\\Users\Amarnadh\Desktop\Python\Secret.json'
VIEW_ID = 'XXXX'
PAGESIZE = 100000


def initialize_analyticsreporting():

    credentials = \
        ServiceAccountCredentials.from_json_keyfile_name(KEY_FILE_LOCATION,
            SCOPES)

  # Build the service object.

    analytics = build('analyticsreporting', 'v4',
                      credentials=credentials)

    return analytics


def get_PT(response):
    for report in response.get('reports', []):
        columnHeader = report.get('columnHeader', {})
        dimensionHeaders = columnHeader.get('dimensions', [])
        metricHeaders = columnHeader.get('metricHeader',
                {}).get('metricHeaderEntries', [])
        pageToken = report.get('nextPageToken', None)
        print str(pageToken) + ' at 43'
    return pageToken


def get_report(analytics, pageToken='unknown'):

    return analytics.reports().batchGet(body={'reportRequests': [{
        'viewId': VIEW_ID,
        'pageSize': PAGESIZE,
        'samplingLevel': 'LARGE',
        'pageToken': pageToken,
        'dateRanges': [{'startDate': '2020-10-11',
                       'endDate': '2020-12-11'}],
        'metrics': [{'expression': 'ga:sessions'}],
        'dimensions': [
            {'name': 'ga:longitude'},
            {'name': 'ga:latitude'},
            {'name': 'ga:country'},
            {'name': 'ga:region'},
            {'name': 'ga:date'},
            {'name': 'ga:pagePath'},
            ],
        }]}).execute()


def print_response(response):

    f = io.open('Essex_GA_Geo' + todayStr + '.txt', 'a+',
                encoding='utf-8')
    for report in response.get('reports', []):
        columnHeader = report.get('columnHeader', {})
        dimensionHeaders = columnHeader.get('dimensions', [])
        metricHeaders = columnHeader.get('metricHeader',
                {}).get('metricHeaderEntries', [])

    # pageToken=report.get('nextPageToken', None)

    # rint(pageToken)

        print columnHeader

      # writing dimention header

        for D_header in dimensionHeaders:
            f.write(str.capitalize(str.replace(D_header, 'ga:', ''))
                    + '|')

        # print(D_header)

        for M_header in list(columnHeader['metricHeader'
                             ]['metricHeaderEntries']):
            f.write(str.capitalize(str.replace(M_header['name'], 'ga:',
                    '')) + '|')

        f.write('\n')

        for row in report.get('data', {}).get('rows', []):
            dimensions = row.get('dimensions', [])
            Metrics = row.get('metrics', [])

    # writing dimention header row data

            for dimension in dimensions:
                f.write(dimension + '|')

    # writing metric header

            for (i, values) in enumerate(Metrics):
                for (metricHeader, value) in zip(metricHeaders,
                        values.get('values')):
                    f.write(value + '|')
            f.write('\n')
    f.close()


def main():
    analytics = initialize_analyticsreporting()
    response = get_report(analytics)

    pageToken = get_PT(response)

    print str(pageToken) + ' at 108'

    print_response(response)

    while pageToken:
        print 'inside while ' + str(pageToken)
        analytics = initialize_analyticsreporting()
        response = get_report(analytics, pageToken)
        pageToken = get_PT(response)
        print_response(response)
        print str(pageToken) + ' at 118'


if __name__ == '__main__':
    main()
amarnadh
  • 31
  • 6
1

You need to do something like this

### Something like this works for me

list = [] #I usually store the output of the pagination in a list

# pagination function
def main(client, pageTokenVariable):
    return analytics.reports().batchGet(
        body={
            'reportRequests': [
            {
            'viewId': '123',
            "pageToken": pageTokenVariable,
            #All your other stuff like dates etc goes here
            }]
        }
    ).execute()

response = main(client, "0")

for report in response.get(reports, []) #All the stuff you want to do
    pagetoken = report.get('nextPageToken', None) #Get your page token fron the FIRST request and store it a variabe
    columnHeader = report.get('columnHeader', {})
    dimensionHeaders = columnHeader.get('dimensions', [])
    metricHeaders = columnHeader.get('metricHeader', {}).get('metricHeaderEntries', [])
    rows = report.get('data', {}).get('rows', [])
    for row in rows:
        # create dict for each row
        dict = {}
        dimensions = row.get('dimensions', [])
        dateRangeValues = row.get('metrics', [])

        # fill dict with dimension header (key) and dimension value (value)
        for header, dimension in zip(dimensionHeaders, dimensions):
          dict[header] = dimension

        # fill dict with metric header (key) and metric value (value)
        for i, values in enumerate(dateRangeValues):
          for metric, value in zip(metricHeaders, values.get('values')):
            #set int as int, float a float
            if ',' in value or ',' in value:
              dict[metric.get('name')] = float(value)
            else:
              dict[metric.get('name')] = int(value)
        list.append(dict) #Append that data to a list as a dictionary

    while pagetoken: #This says while there is info in the nextPageToken get the data, process it and add to the list
        response = main(client, pagetoken)
            for row in rows:
            # create dict for each row
            dict = {}
            dimensions = row.get('dimensions', [])
            dateRangeValues = row.get('metrics', [])

            # fill dict with dimension header (key) and dimension value (value)
            for header, dimension in zip(dimensionHeaders, dimensions):
            dict[header] = dimension

            # fill dict with metric header (key) and metric value (value)
            for i, values in enumerate(dateRangeValues):
            for metric, value in zip(metricHeaders, values.get('values')):
                #set int as int, float a float
                if ',' in value or ',' in value:
                dict[metric.get('name')] = float(value)
                else:
                dict[metric.get('name')] = int(value)
            list.append(dict) #Append that data to a list as a dictionary

#So to recap
#You make an initial call to your function passing a pagetoken to get it started.
#Get the nextPageToken), process the data and append to list
#If there is data in the nextPageToken call the function, process, add to list until nextPageToken is empty
Jeff
  • 161
  • 1
  • 1
  • 12
  • Hi @Jeff, I am using your code but for some reason even when the `pageToken` is updated I am still getting the same data as in the first request. I have a question regarding this problem with a bounty if you could take a look that would be amazing. – Jonas Palačionis Jan 20 '20 at 07:43
0

I do not know if this is a possible answer, but have you consider removing the pageSizeand adding the max-results parameter?

This option allows you to query up to 10.000 elements, and, if you have more than 10.000, you can use the start-index option to start at 10.000, 20.000, etc.

You can always know how many results there are in total, because one field in the answer contains this information.

luis.galdo
  • 543
  • 5
  • 20
  • My bad. I have written in my post Google Analytics Reporting API v3, while am using the v4. Indeed, the parameters you indicated works with v3 [and not v4](https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/reports/batchGet). – E. Faslo Feb 01 '19 at 13:55
  • Then [this](https://stackoverflow.com/questions/43657140/google-analytics-api-v4-max-results) might be what you are looking for. PageSize can be of 10.000. – luis.galdo Feb 01 '19 at 14:00