2

I need to download all the audit logs from my organization in GitHub.

The problem is: I have dozens of repositories and 2 years of commit history, so it is a lot of data, and to request it manually would be impossible.

Some one know a tool or a method to retrieve all the information in the audit log of GitHub? Or at least "per repository"?

Thanks.

GoodDeeds
  • 7,956
  • 5
  • 34
  • 61
costamatrix
  • 670
  • 8
  • 17

3 Answers3

0

If you want to download audit log data, you can download using the v4 GraphQL API, which provides a way to access audit log entries. This information is not available with the v3 REST API.

If you want to retrieve just the commit history, which is different, then the easiest way to do that is to clone the repositories. Reading every commit through the API is inefficient and you'll likely hit the rate limit pretty quickly. You can, however, use the API to discover which repositories you have and script it.

bk2204
  • 64,793
  • 6
  • 84
  • 100
0

You can use GraphQL API provided by github. You can pull all audit logs from your organisation using a python script. I was recently working on it.

Also using GraphQL API, you can pull 100 logs at once. So We have to use cursor to navigate till we get to the end of log.

Refer to this link if you want to learn about cursors. https://graphql.org/learn/pagination/

This is the source code,

import requests
import json
import pandas as pd
from datetime import datetime
import time

headers = {"Authorization": "token YOUR PERSONAL TOKEN"}

enterprise = '"ENTERPRISE"'
organizations = []
after = ''
while True:
    getOrgantionsListQuery = """
                                query {
                                      enterprise(slug: """+ enterprise + """) {
                                        ...enterpriseFragment
                                      }
                                    }

                                    fragment enterpriseFragment on Enterprise {
                                      ... on Enterprise{
                                        name
                                        organizations(first: 100, """ + after +"""){
                                          edges{
                                            node{
                                              name
                                              ... on Organization{
                                                name
                                              }

                                            }
                                            cursor
                                          }
                                          pageInfo {
                                            endCursor
                                            hasNextPage
                                            hasPreviousPage
                                          }
                                        }
                                      }
                                    }
                            """

    result = requests.post('https://api.github.com/graphql',
                                    json={'query': getOrgantionsListQuery},
                                    headers=headers)

    enterpriseData = json.loads(result.text)

    if 'errors' in enterpriseData:
        print(enterprise+ " " + enterpriseData['errors'][0]['type'])
        break

    enterpriseAudit = enterpriseData['data']['enterprise']['organizations']

    for org in enterpriseAudit['edges']:
        organizations.append(org['node']['name'])

    if not enterpriseAudit['pageInfo']['hasNextPage']:
        break

    after = 'after: "' + str(enterpriseAudit['edges'][-1]['cursor']) + '"'
    time.sleep(1)


response = []

for org in organizations:   
    after = ''
    org = '"' + org + '"'
    while True:
        getAuditLogQuery = """
                    query {
                      organization(login: """+ org + """) {
                        auditLog(first: 100, """ + after +""") {
                          edges {
                            node {
                              ... on RepositoryAuditEntryData {
                                repository {
                                  name
                                }
                              }
                              ... on OrganizationAuditEntryData {
                                organizationResourcePath
                                organizationName
                                organizationUrl
                              }

                              ... on TeamAuditEntryData {
                                teamName
                              }

                              ... on TopicAuditEntryData {
                                topicName
                              }

                              ... on OauthApplicationAuditEntryData {
                                oauthApplicationName
                              }
                              
                              ... on EnterpriseAuditEntryData {
                                enterpriseResourcePath
                                enterpriseUrl
                                enterpriseSlug
                              }

                              ... on AuditEntry {
                                actorResourcePath
                                action
                                actorIp
                                actorLogin
                                operationType
                                createdAt
                                actorLocation {
                                  countryCode
                                  country
                                  regionCode
                                  region
                                  city
                                }
                                #User 'Action' was performed on
                                userLogin
                                userResourcePath
                                userUrl
                              }
                            }
                            cursor
                          }
                          pageInfo {
                            endCursor
                            hasNextPage
                            hasPreviousPage
                          }
                        }
                      }
                    }
                """
        result = requests.post('https://api.github.com/graphql',
                                json={'query': getAuditLogQuery},
                                headers=headers)

        organizationData = json.loads(result.text)

        if 'errors' in organizationData:
            print(org + " " + organizationData['errors'][0]['type'])
            break

        auditLog = organizationData['data']['organization']['auditLog']

        print(org + " " + str(len(auditLog['edges'])))

        for log in auditLog['edges']:
            response.append(log)

        if not auditLog['pageInfo']['hasNextPage']:
            break

        after = 'after: "' + str(auditLog['edges'][-1]['cursor']) + '"'
        time.sleep(1)


df = pd.DataFrame(response)
df.to_json(r'/YOUR/PATH/TO/SAVE' + str(datetime.now()) +'.json')```
0

Check if GitHub Audit Log Streaming can help in your case.

It just got out of beta (Jan. 2022):

Audit log streaming is generally available

GitHub audit log streaming is now out of beta and generally available.

Your experience using audit log streaming will not change, but we expanded the number of options you have for where you can stream your audit and Git events:

  • Amazon S3
  • Azure Blob Storage
  • Azure Event Hubs
  • Google Cloud Storage
  • Splunk

Enterprise owners can set up their stream in minutes by navigating to their enterprise account settings under the Audit log tab and configuring the collection endpoint.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250