13

I'm trying to get all the content from Wikipedia:Unusual_articles and I'm able to get the list of table content by calling this endpoint:

https://en.wikipedia.org/w/api.php?action=parse&format=json&prop=sections&page=Wikipedia:Unusual_articles

and the data I got back look something like this:

{
    title: "Wikipedia:Unusual articles",
    pageid: 154126,
    sections: [
        {
            toclevel: 1,
            level: "2",
            line: "Places and infrastructure",
            number: "1",
            index: "T-1",
            fromtitle: "Wikipedia:Unusual_articles/Places_and_infrastructure",
            byteoffset: null,
            anchor: "Places_and_infrastructure"
        },
        {
            toclevel: 2,
            level: "3",
            line: "Americas",
            number: "1.1",
            index: "T-2",
            fromtitle: "Wikipedia:Unusual_articles/Places_and_infrastructure",
            byteoffset: null,
            anchor: "Americas"
        },
...

But I'm not able to get the content of a particular section. For example under Americas is a list of the table with a link and a short description, but is there a way to obtain the link and short description from the API?

table

Termininja
  • 6,620
  • 12
  • 48
  • 49
John Lim
  • 3,019
  • 4
  • 31
  • 41
  • I'd suggest reading the API documentation and figuring out which API call will give you article content. – miken32 Oct 24 '16 at 19:29
  • Your best bet is probably to parse the table HTML. The API call is almost right, your are just using the wrong property. – Tgr Oct 30 '16 at 06:06
  • @Tgr what props am I supposed to use to get the table html? – John Lim Oct 30 '16 at 08:25
  • Try [this query](https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=json&page=Wikipedia%3AUnusual_articles%2FPlaces_and_infrastructure&prop=text&section=2) (the table is transcluded from a subpage). In general, ApiSandbox is the easy way to find out what parameters you need. – Tgr Oct 30 '16 at 19:55

1 Answers1

17

You can get the content of every page section by using MediaWiki API with action=parse in two steps. First you have to get all sections from the page with:

https://en.wikipedia.org/w/api.php?action=parse&prop=sections&page=Wikipedia:Unusual_articles

From the response you see that section Americas has index=T-2 (T means transcluded page) and it comes from fromtitle=Wikipedia:Unusual_articles/Places_and_infrastructure. Now we use these index and fromtitle to get the content of the section with:

https://en.wikipedia.org/w/api.php?action=parse&page=Wikipedia:Unusual_articles/Places_and_infrastructure&section=2&prop=...

where:

  • prop=wikitext - gives the original section wikitext that was parsed.
  • prop=text - gives the parsed section text of the wikitext.
Termininja
  • 6,620
  • 12
  • 48
  • 49
  • i'm able to get section details from above api. by passing section index. but it returning html text . I want to get only plain text. how can i get it ? – Harshal Bhatt Nov 07 '17 at 04:21