1

I have a list of Wikipedia articles (my own history in my browser). I would like to draw a tree of my visits on Wikipedia, by drawing a line for each internal Wikipedia hyperlink. For a nice result, I would like to represent each node with the name of the article, and, for the articles that have at least one image in the body, an image extracted from the article.

Which image is the best candidate ? I noticed that there is sometimes an image named thumbimage, but this is not always the case.

alecail
  • 3,993
  • 4
  • 33
  • 52
  • Related, close to duplicate: http://stackoverflow.com/questions/12147886/how-can-i-get-the-principal-image-from-mediawiki-api – Ilmari Karonen Feb 08 '14 at 14:59

3 Answers3

1

Check out the DBPedia image dataset:

http://wiki.dbpedia.org/Downloads38#h227-1

They have chosen a representative image for many articles. They don't update all that often (latest currently is from June 2012, I think, so 4 months ago as I write), but they do a really good job and you could possibly use their codebase to parse yourself if you needed more current data.

mrjf
  • 1,117
  • 1
  • 12
  • 22
0

I've had a similar experience trying to grab a relevant image from a particular page. In my case, I utilize the og:image property.

You can read more about it here: http://ogp.me/

ThaDick
  • 193
  • 2
  • 11
0

Another approach would be to parse an image yourself out of the page using either the HTML or the wikimedia markup. I would suggest take the infobox image, if one is available, and failing that, the first image on the page.

mrjf
  • 1,117
  • 1
  • 12
  • 22