12

I had to switch my public Github repository to private and cannot access files, not with access tokens that I was able to with the public Github repo.

I can access my private repo's CSV with curl: ''' curl -s https://{token}@raw.githubusercontent.com/username/repo/master/file.csv

'''

However, I want to access this information in my python file. When the repo was public I could simply use: ''' url = 'https://raw.githubusercontent.com/username/repo/master/file.csv' df = pd.read_csv(url, error_bad_lines=False)

'''

This no longer works now that the repo is private, and I cannot find a work around to download this CSV in python instead of pulling from terminal.

If I try: ''' requests.get(https://{token}@raw.githubusercontent.com/username/repo/master/file.csv) ''' I get a 404 response, which is basically the same thing that is happening with the pd.read_csv(). If I click on the raw file I see that a temporary token is created and the URL is: ''' https://raw.githubusercontent.com/username/repo/master/file.csv?token=TEMPTOKEN ''' Is there a way to attach my permanent private access token so that I can always pull this data from github?

everwitt7
  • 191
  • 1
  • 3
  • 12

6 Answers6

7

Yes, you may download CSV file in Python instead of pulling from terminal. To achieve that you may use GitHub API v3 with 'requests' and 'io' modules assistance. Reproducible example below.

import numpy as np
import pandas as pd
import requests
from io import StringIO

# Create CSV file
df = pd.DataFrame(np.random.randint(2,size=10_000).reshape(1_000,10))
df.to_csv('filename.csv') 

# -> now upload file to private github repo

# define parameters for a request
token = 'paste-there-your-personal-access-token' 
owner = 'repository-owner-name'
repo = 'repository-name-where-data-is-stored'
path = 'filename.csv'

# send a request
r = requests.get(
    'https://api.github.com/repos/{owner}/{repo}/contents/{path}'.format(
    owner=owner, repo=repo, path=path),
    headers={
        'accept': 'application/vnd.github.v3.raw',
        'authorization': 'token {}'.format(token)
            }
    )

# convert string to StringIO object
string_io_obj = StringIO(r.text)

# Load data to df
df = pd.read_csv(string_io_obj, sep=",", index_col=0)

# optionally write df to CSV
df.to_csv("file_name_02.csv")
John Smith
  • 95
  • 5
3

This is what ended up working for me - leaving it here if anyone runs into the same issue. Thanks for the help!

    import json, requests, urllib, io

    user='my_github_username'
    pao='my_pao'

    github_session = requests.Session()
    github_session.auth = (user, pao)

    # providing raw url to download csv from github
    csv_url = 'https://raw.githubusercontent.com/user/repo/master/csv_name.csv'

    download = github_session.get(csv_url).content
    downloaded_csv = pandas.read_csv(io.StringIO(download.decode('utf-8')), error_bad_lines=False)
Sachin
  • 239
  • 3
  • 13
everwitt7
  • 191
  • 1
  • 3
  • 12
  • Hey so I am working with a similar private repo (.csv) file and want to know of any ways I can like edit the file hosted on github apart from read it. Intuitively I assume I should be able to edit the data since I can read the data, but I can't arrive at a solution. – Surya Palaniswamy Jun 17 '21 at 18:50
  • You can make edits to the file, but you would then have to commit and push you changes to the repository. For you can just use git commands – everwitt7 Jul 29 '21 at 16:54
  • 1
    This also worked for me, as I created a fine-grained token via GitHub and used it in the requests. It also works with the classic token. Thanks, @everwitt7 – Abdur Rahman Jan 04 '23 at 11:01
1

This way is working for me really good:

    def _github(url: str, mode: str = "private"):
        url = url.replace("/blob/", "/")
        url = url.replace("/raw/", "/")
        url = url.replace("github.com/", "raw.githubusercontent.com/")

        if mode == "public":
            return requests.get(url)
        else:
            token = os.getenv('GITHUB_TOKEN', '...')
            headers = {
                'Authorization': f'token {token}',
                'Accept': 'application/vnd.github.v3.raw'}
            return requests.get(url, headers=headers)
PathSeeker
  • 101
  • 1
  • 9
  • But how to use this function? Please elaborate on that. – Sachin Nov 18 '21 at 05:50
  • working like charm @Sachin for your answer... use it like just 1. Get the personal access token from gihub 2. at the token variable paste your token 3. call the function response = _github(url=github_file_url.json) 4. response.text for getting output data – iampritamraj Mar 15 '23 at 15:50
1

Adding another working example:

import requests
from requests.structures import CaseInsensitiveDict

# Variables
GH_PREFIX = "https://raw.githubusercontent.com"
ORG = "my-user-name"
REPO = "my-repo-name"
BRANCH = "main"
FOLDER = "some-folder"
FILE = "some-file.csv"
URL = GH_PREFIX + "/" + ORG + "/" + REPO + "/" + BRANCH + "/" + FOLDER + "/" + FILE

# Headers setup
headers = CaseInsensitiveDict()
headers["Authorization"] = "token " + GITHUB_TOKEN

# Execute and view status
resp = requests.get(URL, headers=headers)
if resp.status_code == 200:
   print(resp.content)
else:
   print("Request failed!")
Rot-man
  • 18,045
  • 12
  • 118
  • 124
0

Have you looked at the pygithub? Very useful for accessing repos, files, pull requests, history, etc. Docs are here. Here's an example script, which opens a pull request, a new branch off a base branch (you'll need that Access Token, or generate a new one!), and removes a file:

from github import Github
my_reviewers = ['usernames', 'of_reviewers']
gh = Github("<token string>")
repo_name = '<my_org>/<my_repo>'
repo = gh.get_repo(repo_name)
default_branch_name = repo.default_branch
base = repo.get_branch(default_branch_name)
new_branch_name = "my_new_branchname"
new_branch = repo.create_git_ref(ref=f'refs/heads/{new_branch_name}',sha=base.commit.sha)
contents = repo.get_contents("some_script_in_repo.sh", ref=new_branch_name)
repo.delete_file(contents.path, "commit message", contents.sha, branch=new_branch_name)
pr = repo.create_pull(
    title="PR to Remove some_script_in_repo.sh",
    body="This is the text in the main body of your pull request",
    head=new_branch_name,
    base=default_branch_name,
)
pr.create_review_request(reviewers=my_reviewers)

Hope that helps, happy coding!

Sam
  • 1,406
  • 1
  • 8
  • 11
0

Apparently, nowadays, rawgithubusercontent links also work simply with a token, but in python's request case, they need a username:token combination which used to be the norm before github changed it so that only a token is sufficient.

So:

https://{token}@raw.githubusercontent.com/username/repo/master/file.csv

becomes

https://{username}:{token}@raw.githubusercontent.com/username/repo/master/file.csv

A sample code for the above would be as follows:

from requests import get as rget

res = rget("https://<username>:<token>@raw.githubusercontent.com/<username>/repo/<repo>/file.csv")
with open('file.csv', 'wb+') as f:
        f.write(res.content)