2

I'm experimenting with the python module wikipedia which is a wrapper for the wikipedia API. In particular I'm looking at the links API, which as I understand should return a 'List of titles of Wikipedia page links on a page', i.e. all the references to other wikipedia pages within the text of the page I'm querying about. When I look at the result for the article on Google, I get a list of links as expected (wikipedia titles in JSON format). The problem is that there seem to be links listed there that do not appear on the Google page. I thought maybe it's including links to Google, but that doesn't work either, in particular, the third link returned in the JSON structure is to ADATA. I don't see a link to ADATA anywhere on the Google page, nor a link to Google anywhere on the ADATA page. Is this a bug or am I missing something obvious?

I believe this link is enough to reproduce the issue:

https://en.wikipedia.org/w/api.php?action=query&titles=Google&prop=links

The result I see looks like this:

{
    "continue": {
        "plcontinue": "1092923|0|Aardvark_(search_engine)",
        "continue": "||"
    },
    "query": {
        "pages": {
            "1092923": {
                "pageid": 1092923,
                "ns": 0,
                "title": "Google",
                "links": [
                    {
                        "ns": 0,
                        "title": "111 Eighth Avenue"
                    },
                    {
                        "ns": 0,
                        "title": "2600: The Hacker Quarterly"
                    },
                    {
                        "ns": 0,
                        "title": "ADATA"
                    },
. . .

In python you can reproduce like this:

import wikipedia
wikipedia.page('Google').links

which produces output like this:

['111 Eighth Avenue',
 '2600: The Hacker Quarterly',
 'ADATA',
 'AI Challenge',
 'AKM Semiconductor, Inc.',
 'AOL',
 'API.AI',
OldGeeksGuide
  • 2,888
  • 13
  • 23
  • You may want to post your code, so that others can try to replicate. – perfect5th Jul 03 '17 at 22:17
  • 1
    Shouldn't the list be massive? Why are there only a handful of links? – cs95 Jul 03 '17 at 22:23
  • By default it returns the first 10 links, I believe – OldGeeksGuide Jul 03 '17 at 22:24
  • In python, it returns over a thousand, but I don't know if that's all of them. – OldGeeksGuide Jul 03 '17 at 22:24
  • What do you mean under "_there seem to be links listed there that do not appear on the Google page_"? The [ADATA](https://en.wikipedia.org/wiki/ADATA) link is right there on the [Google](https://en.wikipedia.org/wiki/Google) wiki page in the `Links to related articles -> Major information technology companies -> Information storage` list. – zwer Jul 03 '17 at 22:47
  • Ah. That's not visible when I visit the page and do a find on the page, nor do I see it when I try editing the page. I don't see it until I click on the "Links to Related Articels" show/hide button. Thanks for the info. As I said in the question 'am I missing something obvious?' :-) – OldGeeksGuide Jul 03 '17 at 22:56

2 Answers2

1

The list contains links which appear in the wikitext of the page or in templates called from the wikitext. It is updated by a queued job after every edit. Due to the async nature of job handling and the finite number of retries for failed jobs, it is possible for the list to differ from actual article content, but very unlikely. (It's probably possible to add links to wikitext in such a way that they don't show up in the article HTML at all, but again it's unlikely anyone would actually do that.)

Tgr
  • 27,442
  • 12
  • 81
  • 118
0

There seem to be some bits of the page that are not visible by default when visiting the page. In this example, the link appears when you click on the 'show' button for "Major information technology companies" at the bottom of the page. I believe this should account for what I'm seeing.

Thanks to zwer in the comments for pointing out where to find the link.

OldGeeksGuide
  • 2,888
  • 13
  • 23