3

I am trying to download an image from Wikimedia Commons by using a URL to a page in the file namespace:

wget http://commons.wikimedia.org/wiki/File:A_golden_tree_during_the_golden_season.JPG

all I get is a JPG file that I cannot open. But when you go to the link you actually see the page instead of the image itself, but there is a link called "Full resolution" that sends you to the real image link which is: http://upload.wikimedia.org/wikipedia/commons/9/92/A_golden_tree_during_the_golden_season.JPG

How can I download this file by having only the first link ?

Rainer Rillke
  • 1,281
  • 12
  • 24
Altin Ukshini
  • 235
  • 5
  • 14

4 Answers4

2

You can try the following:

wget http://commons.wikimedia.org/wiki/File:A_golden_tree_during_the_golden_season.JPG -O output.html; wget $(cat output.html | grep fullMedia | sed 's/\(.*href="\/\/\)\([^ ]*\)\(" class.*\)/\2/g')

The first wget fetches the link you specify. I browsed few pages and found that high resolution images were under div with class=fullMedia. It parses the url of the image and then fetches that image.

PS: As suggested above, bash is not a neat way of doing this. You should look at something that parses dom trees.

jitendra
  • 1,438
  • 2
  • 19
  • 40
  • 1
    +1: You deserve some reputation points for your research effort. – johnsyweb Feb 23 '13 at 07:19
  • The output of action=view of index.php (here both is implicitly used through rewrite rules and MediaWiki defaults) depends on various factors and may change unexpectedly at any time. Do not rely on that. Either use the [API](http://commons.wikimedia.org/w/api.php?action=query&prop=imageinfo&iiprop=url&titles=File:A_golden_tree_during_the_golden_season.JPG) or extract the title without namespace and pass it to [[Special:Redirect]] as I suggested in my reply below. – Rainer Rillke Jun 02 '14 at 16:10
2

Extract the title without namespace (A_golden_tree_during_the_golden_season.JPG) and pass it to Special:Redirect.

wget http://commons.wikimedia.org/wiki/Special:Redirect/file/$( echo 'http://commons.wikimedia.org/wiki/File:A_golden_tree_during_the_golden_season.JPG' | sed 's/.*\/File\:\(.*\)/\1/g' )
Rainer Rillke
  • 1,281
  • 12
  • 24
0

wget http://upload.wikimedia.org/wikipedia/commons/9/92/A_golden_tree_during_the_golden_season.JPG

You were fetching the web page not the image itself.

kkaehler
  • 493
  • 1
  • 4
  • 13
  • 2
    That seems to be understood in the question. – johnsyweb Feb 23 '13 at 02:26
  • yes I know, but I want to get it from the first link... I have a list of image names that I can do a wget link/File:imagename But that doesn't work bcs the image I download cannot be opened – Altin Ukshini Feb 23 '13 at 02:30
0

you can use the following link to retrive :https://upload.wikimedia.org/wikipedia/commons/9/92/A_golden_tree_during_the_golden_season.JPG Even I had got the same problem,click on the image you will get the above link ,i hope this helps