The GitHub Archive project states
GitHub Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.
This archive is also queryable through Google Big Query. However, it looks like that I'm either missing something or only a portion of the data is available.
Indeed, running the following query only returns 1636
WatchEvents (started or stopped), whereas the Rails repository accounts more than 14300
watchers.
SELECT actor_attributes_login, created_at, payload_action
FROM [githubarchive:github.timeline]
where repository_name = "rails"
and type="WatchEvent"
order by created_at asc;
It looks like the oldest retrieved piece of data is more or less 2.5 months old.
Would the data be truncated (which might seem strange for an archive)? Is there a limit/quota I wouldn't know of related to the use of BigQuery?