3

I'm trying to get an image from a Wikipedia article. I have the title of the article but it seems like I need to know the pageid to access the thumbnail. How do I get the pageid from the title?

My JavaScript code:

$.getJSON("http://en.wikipedia.org/w/api.php?action=query&titles=" + article + "&prop=pageimages&format=json&pithumbsize=350", function (data) {
    imageURL = data.query.pages[/* pageid */].thumbnail.source;
});

Here's what I'm parsing (example for article = "Car"):

{"query":{"pages":{"13673345":{"pageid":13673345,"ns":0,"title":"Car","thumbnail":{"source":"http://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Benz-velo.jpg/100px-Benz-velo.jpg","width":100,"height":80},"pageimage":"Benz-velo.jpg"}}}}

^ It seems like I first need to know that it's the 13673345 index.

Benck
  • 515
  • 1
  • 5
  • 17
  • isn't the image right there? remove "thumb" and the part after the true filename and that's your image: https://upload.wikimedia.org/wikipedia/commons/1/1e/Benz-velo.jpg – Mike 'Pomax' Kamermans May 02 '15 at 18:53
  • But how do I get the source without doing this: `data.query.pages[/* pageid */].thumbnail.source` ? – Benck May 02 '15 at 18:57
  • Just enumerate the object until you find the entry with the expected title (in fact, there should be only one there) – Bergi May 02 '15 at 19:03

2 Answers2

4

OP asks how to "access the thumbnail", i.e., the URL within the returned data. He did not ask how to access the full image behind the thumbnail ... which is something other answers address.

OP's problem is that the data is keyed to the page ID. In fact, the query could return more than one article in which case there would be multiple page IDs and thumbnails.

The following query returns the data used in the code snippet:

http://en.wikipedia.org/w/api.php?action=query&titles=Stack_Overflow&prop=pageimages&format=json&pithumbsize=350

And OP can extract the page IDs using this code:

var pageid = [];
for( var id in data.query.pages ) {
    pageid.push( id );
}

Run the code snippet below to test.

<html>
<body>
  
<img id="thumbnail"/>  
  
<script type="text/javascript">
 
var data =  {
      "query":
      {
        "normalized": [
        {
          "from": "Stack_Overflow",
          "to": "Stack Overflow"
        }],
        "pages":
        {
          "21721040":
          {
            "pageid": 21721040,
            "ns": 0,
            "title": "Stack Overflow",
            "thumbnail":
            {
              "source": "http://upload.wikimedia.org/wikipedia/commons/thumb/6/6a/Stack_Overflow_homepage.png/350px-Stack_Overflow_homepage.png",
              "width": 350,
              "height": 185
            },
            "pageimage": "Stack_Overflow_homepage.png"
          }
        }
      }
    };
 

  
    // get the page IDs
 var pageid = [];
 for( var id in data.query.pages ) {
  pageid.push( id );
 }
 
    // display the thumbnail using a page ID
    document.getElementById('thumbnail').src = data.query.pages[ pageid[0] ].thumbnail.source;
  
  </script>
 
  </body>
  </html>
Yogi
  • 6,241
  • 3
  • 24
  • 30
  • for-in is considered a fairly dangerous patterns though (because you're not guaranteed "safe" property names). Instead, ES5 has `Object.keys()` to safely get the set of property names for any object. – Mike 'Pomax' Kamermans May 03 '15 at 04:50
1

Just build your JSON object with JSON.parse so you have an object that looks like:

var response = {
  query: {
    pages: {
      "13673345":{
        pageid: 13673345,
        ns: 0,
        title: "Car",
        thumbnail: {
          source: "http://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Benz-velo.jpg/100px-Benz-velo.jpg",
          width: 100,
          height: 80
        },
        pageimage: "Benz-velo.jpg"
      }
    }
  }
};

And then you can clearly see you don't need pageid in the slightest, you just need to process the correct "pages" object.

In this case there's only one, but even if there would be multiple, just run through Object.keys for the response.query.pages object:

var pages = response.query.pages;
var propertyNames = Object.keys(pages);
propertyNames.forEach(function(propertyName) {
  var page = pages[propertyName];
  var thumbnail = page.thumbnail.src;
  var imgURL = thumbnail.replace("/thumb/",'').replace(/\.(jpg|png).*/,".$1");
  doSomethingWith(imgURL);
});

(note the file extension regexp, which we do because who says all images are jpg? Better to pick jpg and png, since those are the two prevailing image formats on the web)

Mike 'Pomax' Kamermans
  • 49,297
  • 16
  • 112
  • 153
  • I get a TypeError for the imgURL line : `[Error] TypeError: undefined is not an object (evaluating 'thumbnail.replace')` . What's the problem? – Benck May 02 '15 at 19:38
  • 1
    that would make sense. Don't just copy paste my cope, make sure I don't have typos. Instead of saying there is an error, what you should have said was "hey the JSON uses `source`, not `src`". SO is great for getting answers, but always make sure the code people suggest is free from typing errors ;) – Mike 'Pomax' Kamermans May 02 '15 at 21:50
  • @Benck - There's a minor mistake in the above code causing the error. Change "page.thumbnail.src;" to "page.thumbnail.source;" and it will work. Other than that, Mike K provides a solid answer to your question. – Yogi May 03 '15 at 00:13