How to find the top most viewed wikipedia pages from today's date using pageviewapi in python

Question

I'm working on a project that needs to find the top wikipedia pages from today, and it uses the code found on the GitHub for pageviewapi, except it substitutes in today's date. The original code on the GitHub is this:

import pageviewapi
pageviewapi.top('fr.wikipedia', 2015, 11, 14, access='all-access')

My code looks like this:

from datetime import datetime
currentDay = datetime.now().day
currentMonth = datetime.now().month
currentYear = datetime.now().year
import pageviewapi
toppages = pageviewapi.top('fr.wikipedia', currentYear, currentMonth, currentDay, access='all-access')
print(toppages)

This just returns a long error (shown below). What is the issue with using this code, and why doesn't a result show up?

Error: Traceback (most recent call last): File "main.py", line 54, in toppages = pageviewapi.top('fr.wikipedia', currentYear, currentMonth, currentDay, access='all-access') File "/home/runner/wikipedia-game/venv/lib/python3.8/site-packages/pageviewapi/client.py", line 94, in top return api(TOP_ENDPOINT, args) File "/home/runner/wikipedia-game/venv/lib/python3.8/site-packages/pageviewapi/client.py", line 154, in api response.raise_for_status() File "/home/runner/wikipedia-game/venv/lib/python3.8/site-packages/requests/models.py", line 960, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://wikimedia.org/api/rest_v1/metrics/pageviews/top/fr.wikipedia/all-access/2022/3/24

score 3 · Answer 1 · answered Mar 24 '22 at 20:51

3

There are two problems - first you need to use 03 as the month, not 3 so use:

currentMonth = str(datetime.now().month).zfill(2)

Secondly, you can't view today's pageviews as they don't exist yet - the best you can do is use yesterday's, so:

currentDay = datetime.now().day-1

answered Mar 24 '22 at 20:51

smartse

1,026
7
12

Thank you so much! However, when I print off the list of top pages, it gives me all 1000 of them. How do I make it so that it only provides one of them, at random. I tried something like: print(toppages['rank'][4]) except 4 being a random number. – George Mar 25 '22 at 03:54
For that you need print(toppages['items'][0]['articles'][4]. If you use an IDE like Spyder, working this out for yourself through the variable explorer is easy. – smartse Mar 25 '22 at 10:00

How to find the top most viewed wikipedia pages from today's date using pageviewapi in python

1 Answers1