0

I want to fetch all commits history from GitHub code organization consisting of 225+ code repos private as well as public. I saw a lot of other solutions in google and stackoverflow, but couldn't concede a single one. I am looking for an automated solution where we can fetch or retrieve all the commits history once and then scheduled it from a particular date According to GitHub Api's I wasn't able to do it as it has no of API(s) hits restriction per day to the GitHub server.

primarily I am trying to fetch all commits information into a CSV file. Kindly share if any python code/script serves this purpose.

Syed Ahmad
  • 82
  • 3
Rajmdg
  • 38
  • 4

3 Answers3

2

Had a similar requirement earlier. Solved it using below python logic. I hope you have performed cloned for all the code repos and your ssh personal access key is added to your git global configuration.

  1. For all the repos, you can add them to a list and loop it across to fetch the remote origin updates/ref-HEADs for all branches for the mentioned code repos in the list.
  2. Used git log to get all the commits initially using git log --all and then automate it to get the weekly/monthly as per your requirement using git log --all after={date}
  3. Used csv module to write the git log using pretty format and appending the output to a csv file consistently.

Please find the below code snippet.

import os
import csv

#Dummy Blank CSV for adding newline in csv module
dummy = "<Path:>/dummy.csv"

#Loop through your set of repos 
list = ["Code repo 1", "Code repo 2"]
for repo in list:
    os.chdir(repo)
    # git pull all the remote origin updates from all branches
    cmd1 = "git pull --all".format(repo)
    os.system(cmd1)
    #git log all (for initial log) & then update it with --after=<date> (from a specified date - you can automate/schedule it)
    cmd2 = "git log --all --after=2021-06-10 --pretty=format:'{},%h,%an,%ad,%s' > {}.csv".format(repo,repo)
    os.system(cmd2)
    src = "{}.csv".format(repo)
    #To append here as CSV I have used csv module
    tf = open('<Path:>/Gitlog-output.csv', 'a+', newline="")
    if os.path.getsize('<Path:>/{}.csv'.format(repo)) != 0:
        #Writing each git log data to the above output file and conditional newline if there isn't a commit in any branch.
        tf.write(open(src).read())
        tf.write(open(dummy).read())
    tf.close()

    print("Finished logging {}".format(repo))
    # To track the list of remaining repos from your list
    print("Remaining Repos: {}".format(len(list) - list.index(repo) -1 ))
    print("#####################################")
suvam97
  • 38
  • 7
0

If you were trying to get the commit history of a single repository, the following script would be very helpful,

git log > data.csv

For 225 repos, you would need to use this command for each of those repos separately, probably in the form of a for loop or a while loop.

Harris Minhas
  • 702
  • 3
  • 17
0

If the repos are in a flat directory you can do this with a oneliner.

ls -d **/ | xargs -I{} -P12 git -C {} log --all --after="<date> 00:00" --before="<date> 23:59"

Can't help with restrictions per day when doing it with the api

Serve Laurijssen
  • 9,266
  • 5
  • 45
  • 98