17

I do have a filename from wikimedia commons and I want to access the thumbnail-image directly.

Example: Tour_Eiffel_Wikimedia_Commons.jpg

I found a way to get json-data containing the url to the thumbnail I want:

https://en.wikipedia.org/w/api.php?action=query&titles=Image:Tour_Eiffel_Wikimedia_Commons.jpg&prop=imageinfo&iiprop=url&iiurlwidth=200

but I don't want another request. Is there a way to access the thumbnail directly?

user2033412
  • 1,950
  • 2
  • 27
  • 47

3 Answers3

23

If you're okay to rely on the fact the current way of building the URL won't change in the future (which is not guaranteed), then you can do it.

The URL looks like this:

https://upload.wikimedia.org/wikipedia/commons/thumb/a/a8/Tour_Eiffel_Wikimedia_Commons.jpg/200px-Tour_Eiffel_Wikimedia_Commons.jpg

  • The first part is always the same: https://upload.wikimedia.org/wikipedia/commons/thumb
  • The second part is the first character of the MD5 hash of the file name. In this case, the MD5 hash of Tour_Eiffel_Wikimedia_Commons.jpg is a85d416ee427dfaee44b9248229a9cdd, so we get /a.
  • The third part is the first two characters of the MD5 hash from above: /a8.
  • The fourth part is the file name: /Tour_Eiffel_Wikimedia_Commons.jpg
  • The last part is the desired thumbnail width, and the file name again: /200px-Tour_Eiffel_Wikimedia_Commons.jpg
svick
  • 236,525
  • 50
  • 385
  • 514
  • 3
    I love it! Thank you! – user2033412 Nov 13 '15 at 11:41
  • 2
    @svick thanks for the answer - it got me a long way. On thing though: the md5 hash is calculated after replacing all spaces in the filename with underscores – simone Mar 12 '20 at 20:59
  • Just learned that if the image is a .svg you need to add .png to make it work. E.g.: https://upload.wikimedia.org/wikipedia/commons/thumb/6/65/Ei-map.svg/440px-Ei-map.svg.png – loomi Oct 13 '20 at 10:48
3

Solution in Python based on the solution of @svick:

import hashlib
def get_wc_thumb(image, width=300): # image = e.g. from Wikidata, width in pixels
    image = image.replace(' ', '_') # need to replace spaces with underline 
    m = hashlib.md5()
    m.update(image.encode('utf-8'))
    d = m.hexdigest()
    return "https://upload.wikimedia.org/wikipedia/commons/thumb/"+d[0]+'/'+d[0:2]+'/'+image+'/'+str(width)+'px-'+image
loomi
  • 2,936
  • 3
  • 25
  • 28
  • For image https://commons.wikimedia.org/wiki/File:%E3%82%B8%E3%83%96%E3%83%81%E5%A4%A7%E4%BD%BF%E9%A4%A8%E3%81%AF%E4%B8%80%E8%BB%92%E5%AE%B6.jpg I get `UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)` at `m.update(image.encode('utf-8'))`. – Nicolas Raoul Jan 12 '21 at 13:30
1

In case anyone is doing this query in SPARQL instead of Python: There exists an MD5 function in SPARQL and the whole string manipulation can be implemented in SPARQL too!

  BIND(REPLACE(wikibase:decodeUri(STR(?image)), "http://commons.wikimedia.org/wiki/Special:FilePath/", "") as ?fileName) .
  BIND(REPLACE(?fileName, " ", "_") as ?safeFileName)
  BIND(MD5(?safeFileName) as ?fileNameMD5) .
  BIND(CONCAT("https://upload.wikimedia.org/wikipedia/commons/thumb/", SUBSTR(?fileNameMD5, 1, 1), "/", SUBSTR(?fileNameMD5, 1, 2), "/", ?safeFileName, "/650px-", ?safeFileName) as ?thumb)
 

Run this live query in Wikidata's query service: here, as discussed here: https://discourse-mediawiki.wmflabs.org/t/accessing-a-commons-thumbnail-via-wikidata/499

Alexa
  • 893
  • 8
  • 6
  • To have scalable vector graphics (SVG) also work out—and just improving on the given answer—append `…svg.png` to the `?safeFileName` like so `REPLACE(?safeFileName, "^(.+[Ss][Vv][Gg])$", "$1.png")` – andreas.naturwiki Sep 27 '21 at 15:17