PyGitHub: making sense of rate limiting and managing the number of requests

Question

The issue

I exceed the GitHub API rate limit when I use PyGitHub – but not when I use bare-bones HTTP requests.

This is strange because my custom client can get what I need in ≈500 requests while the PyGitHub exceeds the 5000 to get the same result.

5k requests/hr

My application uses basic auth with a personal-access-token.

from github import Github
g = Github(token)

The GitHub API docs on rate-limiting says I have 5k requests per-hour for authenticated requests.

For API requests using Basic Authentication or OAuth, you can make up to 5000 requests per hour. Authenticated requests are associated with the authenticated user, regardless of whether Basic Authentication or an OAuth token was used.

The operation

The gist is to scan all the repos in my org, filter by topic, and download specific directories.

Here's the pseudocode for how I'm doing this with my custom client. This can be done in ≈500 requests (scanning ≈10k repos and downloading directories from ≈10 of them).

# Get the first page...
/orgs/<my-org>/repos

# Iterate through the pages...
for page in pages:
    /organizations/<my-org>/repos?page=page

# Filter repos by topic(s)...
repos = [repo for repo in repos if '<my-topic>' in repo['topics']]

# Download some content from qualifying repositories
for repo in repos:
    /repos/<my-org>/<repo-name>/contents/<my-path>
    ...recursively crawl directory at <my-path>

Here's analogous approach using PyGitHub that exceeds rate limits.

from github import Github
g = Github('<token>')
org = g.get_organization('tr')
repos = org.get_repos(type='all')
repos = [repo for repo in repos if '<my-topic>' in repo.get_topics()]
for repo in repos:
    content = repo.get_content('<my-path>')
    ...recursively crawl directory at <my-path>

PyGitHub: making sense of rate limiting and managing the number of requests

The issue

5k requests/hr

The operation

0 Answers0