I've been trying to figure this one out for about a week now and just can't come up with a good solution. So, I figured I would see if anyone could help me out. Here's one of the links that I'm trying to scrape:
I right-clicked to copy image location. This is the link that is copied:
(Can't paste this as a link because I'm new) http:// content (dot) lib (dot) washington (dot) edu/cgi-bin/getimage.exe?CISOROOT=/alaskawcanada&CISOPTR=491&DMSCALE=100.00000&DMWIDTH=802&DMHEIGHT=657.890625&DMX=0&DMY=0&DMTEXT=%20NA3050%20%09AWC0644%20AWC0388%20AWC0074%20AWC0575&REC=4&DMTHUMB=0&DMROTATE=0
There is no clear image URL being displayed. Obviously that's because the image is hidden behind some type of script. Through trial and error I found that I can put ".jpg" after the "CISOPTR=491" and then the link becomes an Image URL. The problem is that this is not the high-resolution version of the image. To get to the high-resolution version I have to change the URL even more. I found a lot of articles @Stackoverflow.com to mention trying to build a script using curl and PHP, I have even tried a few of them with no luck. "491" is the image number and I can change that number to find other images in the same directory. So, scraping a sequence of numbers should be pretty easy. But I'm still a noob at scraping and this one is kicking my butt. Here's what I've tried.
Get remote image using cURL then resample
also tried this.
http://psung.blogspot.com/2008/06/using-wget-or-curl-to-download-web.html
I also have Outwit Hub, and Site Sucker, but they don't recognize the URL as an image file and fo they just pass right ove it. I used SiteSucker overnight and it download 40,000 files and only 60 were jpegs, none of which were the ones I wanted.
The other thing I keep running into, is the files I have been able to download manually, the filename is always either getfile.exe or showfile.exe and then if I manually add ".jpg" as the extension I can view the image locally.
How can I reached the original high-res image file, and automate the download process so that I can scrape a couple hundred of these images?