2

I am trying to access old version of Wiki pages using data instead of "oldid". Usually to access and a version of a wiki page, I have to use the page id like this https://en.wikipedia.org/w/index.php?title=Main_Page&oldid=969106986, is there a way to access the same page using the date without knowing the ID? If i know for example that there is a version of the page published on "12:44, 23 July 2020‎ "

Martin Urbanec
  • 426
  • 4
  • 11
  • 2
    If you are fine with using the API, you can call for example https://en.wikipedia.org/w/api.php?action=query&rvslots=main&prop=revisions&titles=Earth&rvlimit=1&rvprop=content|timestamp&rvstart=2020-05-01T00:00:00Z – Pascalco Sep 11 '20 at 17:52
  • Just in case you are actually wanting to use the main page, beware that the content is transcluded from templates, so your link won't show how the page actually looked on 23 July. – smartse Sep 14 '20 at 15:30

3 Answers3

1

In addition to the "main" API (called the action API by MediaWiki developers), you can also use the REST API. It may or may not be enabled at all wikis, but if you intend to query Wikipedia content.

The revision module of the \action API (linked to in @amirouche's answer) allows you to get the wikitext format of a page. That is the source format that is used by MediaWiki, and it isn't easy to get a HTML from it, which can be easier to analyze (especially if you do ĺingquistic analytics, for instance).

If HTML would be better for your use case, you can use the REST API, see https://en.wikipedia.org/api/rest_v1/#/. For instance, if you're interested in English Wikipedia's Main Page as of July 2008, you can use https://en.wikipedia.org/api/rest_v1/page/html/Main_Page/223883415.

The number (223883415) is the revision ID, which you can get through the action API.

However, keep in mind that re-parses the revision's wikitext into HTML. That means it doesn't need to be exactly what showed as of the date the revision was saved. For instance, the wikitext can contain conditions on current date (that is used for automatically upating the mainpage). If you're intereted in seeing that, you would need to use archive.org.

Martin Urbanec
  • 426
  • 4
  • 11
1

Here is a Python code snippet for this:

import requests
import json

S = requests.Session()

URL = "https://en.wikipedia.org/w/api.php"

PARAMS = {
    "action": "query",
    "prop": "revisions",
    "titles": "Canada",
    "rvlimit": "5",
    "rvprop": "timestamp|user|comment|content",
    "rvdir": "newer",
    "rvstart": "2018-01-01T00:00:00Z",
    "rvend": "2019-01-01T00:00:00Z",
    "rvslots": "main",
    "formatversion": "2",
    "format": "json"
}

R = S.get(url=URL, params=PARAMS)
data = R.json()

This retrieves the first 5 revisions of the Wikipedia article on Canada that occurred after January 1, 2018. The content of the first revision can be found at data["query"]["pages"][0]["revisions"][0]["slots"]["main"]["content"].

flutillie
  • 554
  • 1
  • 7
  • 19
0

You can use the MediaWiki API to get revision; refer to the documentation at: https://www.mediawiki.org/wiki/API:Revisions.

You need to map revision ids with dates. It will be straightforward :).

Martin Urbanec
  • 426
  • 4
  • 11
amirouche
  • 7,682
  • 6
  • 40
  • 94
  • By following the link, I can see that one can filter there "by date and user". Would you mind adding a bit more information from the link to your answer so that it becomes more self-contained? – Lover of Structure Apr 29 '23 at 14:56