3

When calling the Wikipedia API, what do the keys in the links objects mean?

  • I'm guessing ns stands for namespace, but why is it an integer?
  • Why is exists empty for every object?
  • Why is what appears to be the page name title key called *?

For example, calling:

https://en.wikipedia.org/w/api.php?action=parse&page=List_of_cognitive_biases&prop=links

Response:

{
    "parse": {
        "title": "List of cognitive biases",
        "pageid": 510791,
        "links": [{
            "ns": 0,
            "exists": "",
            "*": "Anthropomorphism"
        }, {
            "ns": 0,
            "exists": "",
            "*": "Apophenia"
        }, 
        ...
        ]
    }
}
Termininja
  • 6,620
  • 12
  • 48
  • 49
kal
  • 160
  • 8
  • 2
    You'll get saner output with [formatversion=2](https://en.wikipedia.org/w/api.php?action=parse&page=List_of_cognitive_biases&prop=links&formatversion=2). The default JSON format does weird things for backwards compatibility. – Tgr Nov 15 '16 at 19:57
  • thanks @Tgr! where is this documented? I am having trouble tracking down this knowledge. – kal Nov 18 '16 at 00:07
  • https://www.mediawiki.org/wiki/API:Data_formats#JSON_parameters – Termininja Nov 18 '16 at 08:04

2 Answers2

3

You are right, the ns stands for namespace, and the all "35 namespaces in Wikipedia are numbered for programming purposes...".

The empty exists means that the link to this page is available in Wikipedia. If the link doesn't exist (it is a redlink), this line will missing (example with Wikipedia:Most-wanted articles).

By the way you can get the same but more compact result by using action query:

https://en.wikipedia.org/w/api.php?action=query&titles=List_of_cognitive_biases&prop=links&pllimit=500

For your example the result will be:

"links": [
    {
        "ns": 0,
        "title": "Anthropomorphism"
    },{
        "ns": 0,
        "title": "Apophenia"
    },
    ...
]
Termininja
  • 6,620
  • 12
  • 48
  • 49
1

The property names might make more sense when you learn that the API internally uses an result format that is primarily used for XML output, not the JSON format you're viewing. If you look at your query results in XML, they're

<parse title="List of cognitive biases" pageid="510791">
  <links>
    <pl ns="14" exists="" xml:space="preserve">Category:Articles with unsourced statements from November 2013</pl>
    <pl ns="10" exists="" xml:space="preserve">Template:Biases</pl>
    …
    <pl ns="0" exists="" xml:space="preserve">Academic bias</pl>

Now, to your questions.

I'm guessing ns stands for "namespace"?

Yes.

but why is it an integer?

Because it's the namespace ID. The name of a namespace might change, it might get aliases and similar stuff. You can query those.

Why is exists empty for every object?

Because it's a boolean attribute. It would be missing completely when the linked page doesn't exist.

Why is what appears to be the page name title key called *?

Because it's the content of the "XML tag" object.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375