I want to use Petscan https://petscan.wmflabs.org/ to find articles that belong to certain categories, etc... It is easy to do this with the website they provide. I was wondering if there is a way to do this in Python. I create a URL to send the search query based on parameters given by the user, and it returns a list of Wikipedia article titles.
Asked
Active
Viewed 323 times
3
-
Yes, that's possible. There might be some REST API to pass your keywords. Or if there are transmitted using a GET request, you can copy the link from the browser. For both cases you could use`requests` and probably `json`. – Joe Jan 13 '20 at 18:28
1 Answers
1
If you create a query it links to the URL sent, which contains all the parameters, above the table of results: Link to a pre-filled form for the query you just ran with and without auto-run. PSID is ....
Here's an example of how you can access a query programmatically and where the answers are found in the json returned:
import requests
import json
petscan = requests.get('https://petscan.wmflabs.org/?max_sitelink_count=&categories=cats&project=wikipedia&language=en&cb_labels_yes_l=1&edits%5Bflagged%5D=both&edits%5Bbots%5D=both&search_max_results=500&cb_labels_any_l=1&cb_labels_no_l=1&format=json&interface_language=en&edits%5Banons%5D=both&ns%5B0%5D=1&&doit=').json()
table = petscan['*'][0]['a']['*']

smartse
- 1,026
- 7
- 12
-
Thank you! How would I do it if I have previously defined variables and I want to insert them into the URL. Does the order matter? – Jan 15 '20 at 10:38
-
-
It should be noted that for this solution to work "&format=json" should be inserted manually in the url, because the link provided by petscan will return the html result if left as it is. If there is an easier way, I haven't been able to find it. – Pere Aug 15 '20 at 10:06
-
The URL already contains that doesn't it? That was generated by going to Output > Format and selecting JSON. – smartse Aug 25 '20 at 18:37