1

I'm trying to pull details on anonymous edits from the Wikimedia API, like so:

http://en.wikipedia.org/w/api.php?action=query&list=recentchanges&format=json&rcstart=2014-01-01T00%3A00%3A00Z&rcdir=newer&rcnamespace=0&rcprop=user%7Ctimestamp%7Ctitle&rcshow=anon&rclimit=100&generator=allpages&gapnamespace=0&gaplimit=2

Note the "rcshow=anon" parameter.

It works just fine in the API sandbox: https://www.mediawiki.org/wiki/Special:ApiSandbox#action=query&list=recentchanges&format=json&rcstart=2014-01-01T00%3A00%3A00Z&rcdir=newer&rcnamespace=0&rcprop=user|timestamp|title&rcshow=anon&rclimit=100&generator=allpages&gapnamespace=0&gaplimit=2

But when I try to import it into R, I find that I got back lots of non-anon edits:

library(rjson)
json_file <- "http://en.wikipedia.org/w/api.php?action=query&list=recentchanges&format=json&rcstart=2014-01-01T00%3A00%3A00Z&rcdir=newer&rcnamespace=0&rcprop=user%7Ctimestamp%7Ctitle&rcshow=anon&rclimit=100&generator=allpages&gapnamespace=0&gaplimit=2"
json_data <- fromJSON(file = json_file)
user <- vector()
user <- sapply(json_data$query$recentchanges, function(x) c(user, x$user))
user

Like so:

  [1] "ValterVBot"                        
  [2] "67.87.234.41"                      
  [3] "ValterVBot"                        
  [4] "86.143.229.147"                    
  [5] "Luan Francisco"                    
  [6] "לערי ריינהארט"                     
  [7] "Чаховіч Уладзіслаў"                
  [8] "Soulkeeper"                        
  [9] "ValterVBot"                        
 [10] "Soulkeeper"   

Any idea what's going on and how I can get a set of anonymous edits?

Traviskorte
  • 113
  • 8
  • 1
    The generator you use does not give you more information, you could just skip it for a simpler query, eg. cut &generator=allpages&gapnamespace=0&gaplimit=2 – Ainali Jul 20 '14 at 10:41
  • Good call, that was left over from a previous query. Thanks. – Traviskorte Jul 20 '14 at 15:48

1 Answers1

2

First of all, it “doesn't work” in the API sandbox either, you just need to use the sandbox on the English Wikipedia, not mediawiki.org.

If you look at the results closely, you'll notice that all of the non-anonymous entries have type external. That means those are edits to the Wikidata page for that article, which show as anonymous (I assume that's because the Wikidata user who made the change may not exist on the local wiki). To get rid of those edits, set rctype in your query to filter out external:

http://en.wikipedia.org/w/api.php?action=query&list=recentchanges&format=json&rcstart=2014-01-01T00%3A00%3A00Z&rcdir=newer&rcnamespace=0&rcprop=user|timestamp|title&rcshow=anon&rclimit=100&generator=allpages&gapnamespace=0&gaplimit=2&rctype=edit

svick
  • 236,525
  • 50
  • 385
  • 514