0

Now we have the code that scrapes the links in an article. We need also the number of clicks on a link. Can some one help? Sow far we have this code:

String[] articles = {"Abdominal_pain"};

void setup() {

    for (int i = 0; i < articles.length; i++) {

        String article = articles[i];
        String start = "20160101"; // YYYYMMDD
        String end = "20170101"; // YYYYMMDD

        // documentation: https://wikimedia.org/api/rest_v1/?doc#!/Pageviews_data/get_metrics_pageviews_per_article_project_access_agent_article_granularity_start_end
        // >> https://en.wikipedia.org/w/api.php?action=query&format=json&prop=links&meta=&titles=Albert+Einstein&pllimit=500
        String query = "https://en.wikipedia.org/w/api.php?action=query&format=json&prop=links&meta=&titles="+article+"&pllimit=500";

        String[] lines = loadStrings(query);

        for (int j = 0; j < lines.length; j++) {
            String line = lines[j];

            if (line.contains("\"title\":")) {

                    println(line);
                    // java string split 
            }
        }
    }
}
Kevin Workman
  • 41,537
  • 9
  • 68
  • 107
me-esmee
  • 91
  • 1
  • 5

1 Answers1

0

The query you're using apparently gives you a bunch of articles that your main article "Abdominal_pain" links to.

You need to go a step further and loop through all of those links. You can make your life a lot easier by using JSONObjects instead of parsing Strings like you're currently doing. Check out the loadJSONArray() function for more info, but basically you'd do this:

JSONArray links = loadJSONArray(query);
for (int i = 0; i < values.size(); i++) {
   JSONObject link = values.getJSONObject(i);
   String title = link.getString("title");
   //fetch the info for that title
}

Once you have the title, you can then fetch the information for that page. An example query url is https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Abdominal_pain/daily/20151010/20151012 which returns this JSON:

{"items":[{"project":"en.wikipedia","article":"Abdominal_pain","granularity":"daily","timestamp":"2015101000","access":"all-access","agent":"all-agents","views":1134},{"project":"en.wikipedia","article":"Abdominal_pain","granularity":"daily","timestamp":"2015101100","access":"all-access","agent":"all-agents","views":1160},{"project":"en.wikipedia","article":"Abdominal_pain","granularity":"daily","timestamp":"2015101200","access":"all-access","agent":"all-agents","views":1313}]}

You'll have to do some aggregating to get the totals, or maybe the total is somewhere else in the API.

You're going to have to do a little bit of research on exactly what the API can return. Reading through the documentation is a big part of programming. Luckily the Wikipedia API has great documentation, and that's where you should be looking.

I'd recommend trying something out and posting another question, along with an MCVE, if you get stuck. Good luck.

See also: How to use Wikipedia API to get the page view statistics of a particular page in wikipedia?

Community
  • 1
  • 1
Kevin Workman
  • 41,537
  • 9
  • 68
  • 107
  • 1
    That will give the total page views on those articles and not only the number of clicks from the selected article (which by the way is data that is not available). – Ainali May 12 '16 at 15:14