1

I am having trouble accessing the GitHub timeline from BigQuery.

I was using the following query:

SELECT repository_name, actor_attributes_company, payload_ref_type, payload_action, type, created_at FROM githubarchive:github.timeline WHERE repository_organization = 'foo' and created_at > '2014-07-01'

and everything was working great. Now, it looks like the githubarchive:github.timeline table is no longer available. I've been looking around and I found another table:

SELECT repository_name, actor_attributes_company, payload_ref_type, payload_action, type, created_at FROM publicdata:samples.github_timeline WHERE repository_organization = 'foo' and created_at > '2014-07-01'

This query works but returns zero rows. When I remove the created_at restriction it worked but only returned a few rows from 2012 so it looks like this is just sample data.

Does anyone know how to pull live timeline data from GitHub?

Isabel Inc
  • 1,871
  • 2
  • 21
  • 28
Alex Jauch
  • 58
  • 5

2 Answers2

2

Indeed, publicdata:samples.github_timeline has only sample data.

For the real GitHub Archive documentation, look at http://www.githubarchive.org/

I wrote an article yesterday about querying it:

Sample query:

SELECT repo.name,
       JSON_EXTRACT_SCALAR(payload, '$.action') action,
       COUNT(*) c,
FROM [githubarchive:month.201606]
WHERE type IN ('IssuesEvent')
AND repo.name IN ('kubernetes/kubernetes', 'docker/docker', 'tensorflow/tensorflow')
GROUP BY 1,2
ORDER BY 2 DESC

As Mikhail points out, there's also another dataset with all of GitHub's code:

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
  • OK, thanks. Can you confirm that githubarchive:github.timeline has been removed? That's really the heart of my question. – Alex Jauch Jul 20 '16 at 17:23
  • There has been some breaking changes https://medium.com/google-cloud/github-archive-fully-updated-notice-some-breaking-changes-64e7e7cd0967. – Felipe Hoffa Jul 21 '16 at 04:01
0

Check out githubarchive BigQuery project
It has three datasets: day, month, year with respective daily, monthly and yearly data

Check out https://cloudplatform.googleblog.com/2016/06/GitHub-on-BigQuery-analyze-all-the-open-source-code.html for more details

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Yes, thank you. I read that article. However, it only allows me to get historical data. The .timeline table used to give you a full feed. That is what I'm looking for so I can write some reports. – Alex Jauch Jul 19 '16 at 20:16
  • if you check day dataset - you will even see table with today's data that is updated hourly. and yesterday and day before etc. isn't this what you are looking for? – Mikhail Berlyant Jul 19 '16 at 21:25
  • Yes, I think so. I'll take a look at re-writing the report to take the "Day" tables and compile them into a single view. It was just easier before. I'm not a hugely skilled SQL guy so I'm just hunting and pecking. – Alex Jauch Jul 20 '16 at 23:35