12

I am planning to build a plug-in for Sphinx documentation system plug-in which shows the names and Github profile links of the persons who have contributed to the documentation page.

Github has this feature internally

Contributors

  • Is it possible to get Github profile links of the file contributors through Github API? Note that commiter emails are not enough, one must be able to map them to a Github user profile link. Also note that I don't want all repository contributors - just individual file contributors.

  • If this is not possible then what kind of alternative methods (private API, scraping) you could suggest to extract this information from Github?

Mikko Ohtamaa
  • 82,057
  • 50
  • 264
  • 435

4 Answers4

18

First, you can show the commits for a given file:

https://api.github.com/repos/:owner/:repo/commits?path=PATH_TO_FILE

For instance:

https://api.github.com/repos/git/git/commits?path=README

Second, that JSON response does, in the author section, contain an url filed named 'html_url' to the GitHub profile:

"author": {
      "login": "gitster",
      "id": 54884,
      "avatar_url": "https://0.gravatar.com/avatar/750680c9dcc7d0be3ca83464a0da49d8?d=https%3A%2F%2Fidenticons.github.com%2Ff8e73a1fe6b3a5565851969c2cb234a7.png",
      "gravatar_id": "750680c9dcc7d0be3ca83464a0da49d8",
      "url": "https://api.github.com/users/gitster",   
      "html_url": "https://github.com/gitster",       <==========
      "followers_url": "https://api.github.com/users/gitster/followers",
      "following_url": "https://api.github.com/users/gitster/following{/other_user}",
      "gists_url": "https://api.github.com/users/gitster/gists{/gist_id}",
      "starred_url": "https://api.github.com/users/gitster/starred{/owner}{/repo}",
      "subscriptions_url": "https://api.github.com/users/gitster/subscriptions",
      "organizations_url": "https://api.github.com/users/gitster/orgs",
      "repos_url": "https://api.github.com/users/gitster/repos",
      "events_url": "https://api.github.com/users/gitster/events{/privacy}",
      "received_events_url": "https://api.github.com/users/gitster/received_events",
      "type": "User"
    },

So you shouldn't need to scrape any web page here.


Here is a very crude jsfiddle to illustrate that, based on the javascript extract:

var url = "https://api.github.com/repos/git/git/commits?path=" + filename
$.getJSON(url, function(data) {
    var twitterList = $("<ul />");
    $.each(data, function(index, item) {
        if(item.author) {
            $("<li />", {
                "text": item.author.html_url
            }).appendTo(twitterList);
        }
    });

get Contributors from a GiHub file

Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
6

Using GraphQL API v4, you can use :

{
  repository(owner: "torvalds", name: "linux") {
    object(expression: "master") {
      ... on Commit {
        history(first: 100, path: "MAINTAINERS") {
          nodes {
            author {
              email
              name
              user {
                email
                name
                avatarUrl
                login
                url
              }
            }
          }
        }
      }
    }
  }
}

Try it in the explorer

Using & to have a list of the first 100 contributors of this file without duplicates :

TOKEN=<YOUR_TOKEN>
OWNER=torvalds
REPO=linux
BRANCH=master
FILEPATH=MAINTAINERS
curl -s -H "Authorization: token $TOKEN" \
     -H  "Content-Type:application/json" \
     -d '{ 
          "query": "{repository(owner: \"'"$OWNER"'\", name: \"'"$REPO"'\") {object(expression: \"'"$BRANCH"'\") { ... on Commit { history(first: 100, path: \"'"$FILEPATH"'\") { nodes { author { email name user { email name avatarUrl login url}}}}}}}}"
      }' https://api.github.com/graphql | \
      jq '[.data.repository.object.history.nodes[].author| {name,email}]|unique'
Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
1

Why do you need to use Github API for that? You can just clone the package and use git log:

git log --format=format:%an path/to/file ver1..ver2 |sort |uniq

plaes
  • 31,788
  • 11
  • 91
  • 89
  • Note that commiter emails are not enough, one must be able to map them to a Github user profile link. <-- Which part is difficult to understand? – Mikko Ohtamaa Nov 17 '12 at 13:14
  • It wouldn't be that hard to add another layer (similar to `.mailmap`) that maps email -> github user. – plaes Nov 17 '12 at 13:16
  • 1
    @MikkoOhtamaa You can just search for the email address in GitHub. – Markus Unterwaditzer Nov 17 '12 at 13:17
  • And how you are going to get that information from Github? That's **the question**. – Mikko Ohtamaa Nov 17 '12 at 13:17
  • Email search is no go: This API call is added for compatibility reasons only. There’s no guarantee that full email searches will always be available. The @ character in the address must be left unencoded. Searches only against public email addresses (as configured on the user’s GitHub profile). – Mikko Ohtamaa Nov 17 '12 at 13:19
0

Until and unless it is not necessary to interact with GITHUB API directly one can get the list of contributors by cloning the repo down and then getting into the cloned directory and then getting the list from the github log file using shortlog command

import os 
import commands 

cmd = "git shortlog -s -n"

os.chdir("C:\Users\DhruvOhri\Documents\COMP 6411\pygithub3-0.3")
os.system("git clone https://github.com/poise/python.git")
os.chdir("/home/d/d_ohri/Desktop/python")
output = commands.getoutput(cmd) 
print(output)
raw_input("press enter to continue")

There is one more way to list contributors in case one wants to use GITHUB API, we can use pytgithub3 wrapper to interact with GITHUB API and get list of contributors as follows using list_contributors:

from pytgithub3.services.repo import Repo
r = Repo()
r.lis_contributors(user='userid/author',repo='repo name')
for page in r:
    for result in page:
          print result
D_0909
  • 3
  • 4