Why does the Wikipedia API Call in Python throw up a Type Error?

Question

I'm trying to get all the past revisions (edits) on a certain Wikipedia article using the MediaWIki API. This code should retrieve all the edits made on the FDR Wikipedia page. Here is the code that I wrote in order to do this:

import re
import requests

def GetRevisions():
    url = "https://en.wikipedia.org/w/api.php?action=query=Franklin%20Delano%20Roosevelt=revisions&rvlimit=500&titles=" 

    while True:
        joan = requests.get(url)
        revisions = []                                        
        revisions += re.findall('<continue rvcontinue="([^"]+)"',joan)

        cont = re.search('<continue rvcontinue="([^"]+)"',joan)
        if not cont:
            break
    return revisions

The problem that I keep running into is this error: TypeError: expected string or buffer ` I'm not sure why this error keeps on showing up. Can anyone please give guidance on how to remedy this?

@Jan can you please suggest an alternative to parsing HTML with regular expressions? — dabberson567, Feb 17 '18 at 18:16
From the SO hall of fame: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — user4601931, Feb 17 '18 at 18:23

score 2 · Accepted Answer · answered Feb 17 '18 at 18:53

re.findall('<continue rvcontinue="([^"]+)"',joan)

joan (who is Joan??) is a request object, not a string. You can't apply regular expressions to it.

Additionally, the MediaWiki API URL you're using is malformed. It returns an error, not the data you're looking for.

You can avoid the problem entirely by requesting a JSON response from the MediaWiki API (format=json) and parsing it using .json(), as seen below. Note that I'm using a dictionary to pass parameters to the API -- this means we don't have to escape query strings, and makes it easier to update the query with continue parameters…

url = "https://en.wikipedia.org/w/api.php"
query = {
        "format": "json",
        "action": "query",
        "titles": "Franklin Delano Roosevelt",
        "prop": "revisions",
        "rvlimit": 500,
        }

while True:
    r = requests.get(url, params=query).json()
    print repr(r) # Insert your own code to parse the response here
    if 'continue' in r:
        query.update(r['continue'])
    else:
        break

this looks great. Im a novice in api calling and using them, so what do you mean by parsing the response — dabberson567, Feb 17 '18 at 21:48

Why does the Wikipedia API Call in Python throw up a Type Error?

1 Answers1

Linked