6

In short: I need to get the date of last change in a file hosted on Github.

In long: given that in Github I have a file (an R workspace) that once in a while is updated, I would like to create a function in R that checks if my local file is older than the one in the repo (if you're curious, my motivation is exposed at the end of this post). This is the file I'm talking about.

In principle it should be somewhat easy, since every file has a history page associated with it, but my knowledge is far too poor to know what to do with this. Also, this Q seems to hint at some way of doing what I want using php, but that's terra incognita for me really, so I don't know if it could help in any way.

So, as I said in the short version of this post, I need to find a way to retrieve the date of the last commit for this file. I can find some way to compare it to the commit date of my local file afterwards.

Thanks in advance, Juan

motivation: I'm working in a an online course in R basics which uses a system for self-checking if solutions of exercises are correct (i.e.: students can check their results instantly). This system uses a file with functions and data that is regularly updated because I often find bugs and new problems. So my goal is to have a function to tell the students if there is a newer file available. It would also be neat to find a way to download it and replace the older, but that is secondary now.

Community
  • 1
  • 1
Juan
  • 1,351
  • 1
  • 14
  • 28

3 Answers3

3

The problem is to keep the git-time of the download. The solution below sets the file time to the Git date after each download for the next check.

library(RCurl)
library(rjson)
destination = "datos" # assume current directory
repo = "https://api.github.com/repos/jumanbar/Curso-R/"
path = "ejercicios-de-programacion/rep-3/datos"
myopts = curlOptions(useragent="whatever",ssl.verifypeer=FALSE)

d = fromJSON(getURL(paste0(repo,"commits?path=",path),
                useragent="whatever",ssl.verifypeer=FALSE))[[1]]
gitDate  = as.POSIXct(d$commit$author$date)
MustDownload = !file.exists(destination) |  file.info(destination)$mtime > gitDate
if (MustDownload){
  url = d$url
  commit = fromJSON(getURL(url, .opts=myopts))
  files = unlist(lapply(commit$files,"[[","filename"))
  rawfile = commit$files[[which(files==path)]]$raw_url
  download.file(rawfile,destination,quiet=TRUE)
  Sys.setFileTime(destination,gitDate)
  print("File was downloaded")
}

It looks like from R the useragent and ssl.verifypeer is required; works without from the command line. If you are security-conscious, there is documentation on that subject floating around, but I took the easy path to commit.

Dieter Menne
  • 10,076
  • 44
  • 67
  • This is wonderful thank you! I still cannot get to download the file from R and I really don't know what to make of the `useragent` or `ssl.verifypeer` options. For the former I've tried "Mozilla/5.0" to no avail. When the `download.file` is run I get "download had nonzero exit status" using `method="wget"` or `"curl"` and "URL scheme not supported" (translated from spanish). Anyway, this is what I was hopping for. – Juan May 17 '13 at 22:26
  • Try to put the string from rawfile into the browser, and check if you can download it that way when logged in to github. – Dieter Menne May 19 '13 at 08:01
2

It seems you need a local clone of the github repo. Forgetting language specifics of R for the moment (I don't know R), in git you can get the most recent date in a number of ways through git log. From the git log help file (git help log), under the Placeholders section:

%cd: committer date
%cD: committer date, RFC2822 style
%cr: committer date, relative
%ct: committer date, UNIX timestamp
%ci: committer date, ISO 8601 format

You can retrieve the UNIX timestamp (seconds since the start of January 1st, 1970 - very easily comparable) of the most recent commit for your file, starting from the project root, with the following git log command:

git log --format=%ct -1 -- ejercicios-de-programacion/rep-3/datos

That returns a number, e.g. 1368691710, but you can use the other formats listed as well.

Now you just need to find a way to make this system call from R, with your project root as the working directory. This SO post may help (but again, I don't R).

Community
  • 1
  • 1
Gary Fixler
  • 5,632
  • 2
  • 23
  • 39
  • This would certainly work in my computer but not in others (students PCs), so is not a viable solution in this case. Thanks anyway. – Juan May 17 '13 at 22:38
0

Perhaps you can make use of the "git status" command (which tells you if there are new commits) im combination with cronjobs. But you need a local clone for this. And I never tried to use the output of the command inside a cronjob.

bish
  • 3,381
  • 9
  • 48
  • 69
  • If there are new commits in the remote that you haven't fetched, `git status` won't tell you. You have to `git fetch` first. – Gary Fixler May 17 '13 at 22:47