I'm trying to work on an entry to the Github Data Challenge and I'm trying to analyze a set of PushEvents, but I'm getting some strange(?) results.
users = Hash.new(0)
(0..23).each do |hour|
gz = open("http://data.githubarchive.org/2013-04-01-#{hour}.json.gz")
js = Zlib::GzipReader.new(gz).read
Yajl::Parser.parse(js) do |event|
if event["type"] == "PushEvent" && event["actor_attributes"] && event["actor_attributes"]["login"]
users[event["actor_attributes"]["login"]] += 1
end
end
end
This script works, fine but when I look at the most commits made by a person via
users.values.max
I see someone has made over 7k commits in a day. When I go through and print out
event["payload"]["shas"]
all of the printed results are essentially the same:
585a2f02f36da9ee0625a42aa2d5e98836c8a2de
danil@orionet.ru
Notes added by 'git notes add'
Jenkins
true
I presume that the commit message associated with the PushEvent is "Notes added by 'git notes add'", so does this seem right? Or am I misreading some data here?