3

I want to use Petscan https://petscan.wmflabs.org/ to find articles that belong to certain categories, etc... It is easy to do this with the website they provide. I was wondering if there is a way to do this in Python. I create a URL to send the search query based on parameters given by the user, and it returns a list of Wikipedia article titles.

  • Yes, that's possible. There might be some REST API to pass your keywords. Or if there are transmitted using a GET request, you can copy the link from the browser. For both cases you could use`requests` and probably `json`. – Joe Jan 13 '20 at 18:28

1 Answers1

1

If you create a query it links to the URL sent, which contains all the parameters, above the table of results: Link to a pre-filled form for the query you just ran with and without auto-run. PSID is ....

Here's an example of how you can access a query programmatically and where the answers are found in the json returned:

import requests
import json

petscan = requests.get('https://petscan.wmflabs.org/?max_sitelink_count=&categories=cats&project=wikipedia&language=en&cb_labels_yes_l=1&edits%5Bflagged%5D=both&edits%5Bbots%5D=both&search_max_results=500&cb_labels_any_l=1&cb_labels_no_l=1&format=json&interface_language=en&edits%5Banons%5D=both&ns%5B0%5D=1&&doit=').json()
table = petscan['*'][0]['a']['*']
smartse
  • 1,026
  • 7
  • 12
  • Thank you! How would I do it if I have previously defined variables and I want to insert them into the URL. Does the order matter? –  Jan 15 '20 at 10:38
  • I'm not certain, but the order shouldn't make a difference. – smartse Jan 15 '20 at 11:07
  • It should be noted that for this solution to work "&format=json" should be inserted manually in the url, because the link provided by petscan will return the html result if left as it is. If there is an easier way, I haven't been able to find it. – Pere Aug 15 '20 at 10:06
  • The URL already contains that doesn't it? That was generated by going to Output > Format and selecting JSON. – smartse Aug 25 '20 at 18:37