3

I am new to python and currently handling one of freelancing assignments. In my project I am given subject name xls file which on weekly occasions get updated with new names. I am able to scrape data for the given name and through python insert the obtained data in the google sheet. I have more than 5,000 names in file for now. I thought my code was ready but after 8-10 names I encountered error 429 which states quota limit exceeded. I checked in the website and it seems Google allows limit of 500 requests per 100 seconds per project, and 100 requests per 100 seconds per user. Taking the limitation in consideration I made changes in my code and added sleep so this error is not faced but it seems I am in a misunderstanding here, as per my thinking, my code does 7 requests in a loop run and I run 9 loops before making a sleep(500) execution but I still face the same error. I am sure I am missing out something very noticeable but after 3 days of trying to figure out on my own, I am losing confidence so any help is appreciated, below is the code for reference.

import requests
from bs4 import BeautifulSoup
import gspread
import pandas as pd
from oauth2client.service_account import ServiceAccountCredentials
from pandas import ExcelWriter
import time


# define the scope
scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']

# add credentials to the account
creds = ServiceAccountCredentials.from_json_keyfile_name('/content/drive/MyDrive/ListUpdate.json', scope)

# authorize the clientsheet
client = gspread.authorize(creds)

# get the instance of the Spreadsheet
sheet = client.open('JP_combined_Strip')

# get the first sheet of the Spreadsheet
sheet_instance = sheet.get_worksheet(0)


list_of_lists = sheet_instance.get_all_values()        # req 1
print(len(list_of_lists))

start = int((sheet_instance.cell(2, 1).value))         # req 2 this column is for recording the last row number where this program left off to continue from there next on next run
end = len(list_of_lists) + 1

for i in range(start,end,1):

    ##
    ## code for scraping
    ##
    ##
    ##
    ## scraped data 
    ##

        sheet_instance.update_cell(i, 3, data_1 )        # req 3
        sheet_instance.update_cell(i, 4,data_2)     # req 4
        sheet_instance.update_cell(i, 5, data_3)        # req 5
        sheet_instance.update_cell(i, 6, data_4)                # req 6
        sheet_instance.update_cell(i, 7, data_5)            # req 7
        sheet_instance.update_cell(i, 8, data_6)            # req 8
        sheet_instance.update_cell(i, 9, data_7)        # req 9 (req 7 under loop)
        if i%9 == 0:
            sheet_instance.update_cell(2, 1, i) # req 8 under loop when loop is run9 times = 9 * 7 = 63 requests total
            ## total requests should be 66 in total before each sleep statement is executed which is less than 100 requests as stated in google
            print("sleep")
            time.sleep(500)

The code runs successfully till first sleep, 7 records doo get executed but next batch failsw with this error.

Paul Lena
  • 33
  • 1
  • 7
  • Check out the error handling documentation on 429s for Gmail and apply accordingly: https://developers.google.com/gmail/api/guides/handle-errors#resolve_a_429_error_too_many_requests – Rafa Guillermo Apr 27 '21 at 09:02
  • But I did, I checked about the limitations, it states 100 requests per 100 seconds per user, as per my calculations in every 500 seconds , a total of maximum 66 requests are made. When I looked for similiar problem solutions online, everyone I checked solved or atleast made it working by using sleep which I am also but I am still encountering this error, I really don't know what more to look in those documentation. – Paul Lena Apr 27 '21 at 10:11
  • Imagine the limit as a time-dependent rolling limit. Even if you make only 200 requests in a day, but make them in the space of 10 seconds, you'll go over the 'daily average' of both 500 requests per 100 seconds *and* the 100 requests per second limits. This might cause the API to return 429. If you have a lot of requests to make, try to space them out randomly with something like `sleep(randint(100,600))`. Alternatively, try making [batch requests](https://developers.google.com/calendar/batch). – Rafa Guillermo Apr 28 '21 at 10:07
  • Hi there, my apologies as I thought I had posted my reply when you commented but seems like an issue as I cannot find my reply. My logic was considering Google gives 100 requests per 100 seconds per user and resets the count after 100 seconds. In my code to my understanding the sleep for 500 seconds initiates after 65 requests. only 2 requests happen before loop and 7 requests are present in loop running 9 times before encountering sleep, giving ( 2 + (7 * 9) ) = 65 requests. Quota limit error is still happening and I am not sure why. – Paul Lena May 10 '21 at 14:52
  • The only reason i could think of error is due to misunderstanding my own above logic – Paul Lena May 10 '21 at 14:52

1 Answers1

1

The issue is that you sleep only after a certain amount of requests and ignore that it can fail anywhere in between, so any API call is a potential failure.

There are many solutions to this issue. The best, from my perspective, is to wrap every call into a function with a try-catch block and sleep functionality there.

import time

def api_call_handler(func):
  # Number of retries
  for i in range(0, 10):
    try:
      return func()
    except Exception as e:
      print(e)
      time.sleep(2 ** i)
  print("The program couldn't connect to the Google Spreadsheet API for 10 times. Give up and check it manually.")
  raise SystemError

Example of usage for this code:

# Before
sheet_instance.update_cell(i, 3, data_1)
# Now
api_call_handler(lambda: sheet_instance.update_cell(i, 3, data_1))

This solution adds extra structure to the code and makes it wordy but it is bullet-proof.

Cookie
  • 113
  • 1
  • 5