How to download all images from web page in C?

Question

I have a really difficult problem for me... I hope that for more experienced programmers in C it's not a problem. I have to create a program (in as easy way as possible) which download all images from some web page. This program must be in C (not C++ or any other language).

I find cURL library to download the source code but I don't have any idea how to download images.

ETID: only need to get all images with .jpg, .gif and .png extensions

Please help me. I am unexperienced programmer so please for a clear answers.

Thank you in advance.

Jayesh Bhoi · Accepted Answer · 2014-06-11T12:47:18.180

3

You can do like this

#include <stdio.h>
#include <curl/curl.h>
#include <curl/types.h>
#include <curl/easy.h>
#include <string>

size_t write_data(void *ptr, size_t size, size_t nmemb, FILE *stream) {
    size_t written = fwrite(ptr, size, nmemb, stream);
    return written;
}

int main(void) {
    CURL *curl;
    FILE *fp;
    CURLcode res;
    char *url = "http://localhost/image.jpeg";
    char outfilename[] = "saveimage.jpeg";
    curl = curl_easy_init();
    if (curl) {
        fp = fopen(outfilename,"wb");
        curl_easy_setopt(curl, CURLOPT_URL, url);
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
        curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp);
        res = curl_easy_perform(curl);
        /* always cleanup */
        curl_easy_cleanup(curl);
        fclose(fp);
    }
    return 0;
}

You can try use wget for download all images.

#include <stdio.h>

int main()
{

     char command[] = "wget -A png,jpeg,jpg,gif -r http://www.freeimages.com/";  
     system(command); 
     return 0;
}

edited Jun 11 '14 at 12:47

answered Jun 11 '14 at 12:14

Jayesh Bhoi

24,694
15
58
73

Thank you, but it's a solution if I know the url of image but I have to download all files from web page and I don't know URLs. This program should be universal. – Virgin Jun 11 '14 at 12:17
@Virgin Ok..So for this you need to parse HTML page first. See answer of Basile Strankevitch. – Jayesh Bhoi Jun 11 '14 at 12:22
Ok but as I wrote I don't know very well C language and I haven't any idea how to cut out the URLs with .jpg, .gif and .png extensions – Virgin Jun 11 '14 at 12:27
@Virgin ok...have updated answer but have not tested it. – Jayesh Bhoi Jun 11 '14 at 12:33
Thanks but I'm affraid that this command not works when we compile program under Linux. Any other idea? – Virgin Jun 11 '14 at 12:36
@Virgin It should work in linux. I have test it. See my updated answer for demo.It will create `www.freeimages.com` directory and save all images in that.Which Linux distro you used? – Jayesh Bhoi Jun 11 '14 at 12:47
@Jayesh: I believe the goal of the homework is to avoid `wget` and learn how to parse some *simple* HTML in *C* code! – Basile Starynkevitch Jun 11 '14 at 12:50
Sorry I have a bug with my terminal. All works you are amazing :) Thank you – Virgin Jun 11 '14 at 12:59
@Virgin It's not bad to reject answers after accepting it two times.:) – Jayesh Bhoi Jun 14 '14 at 09:26
sorry it was accidentally – Virgin Jun 14 '14 at 19:50

Basile Starynkevitch · Answer 2 · 2014-06-11T12:47:10.453

You can use libcurl both for textual content (e.g. of mime type text/html) and for images (e.g. of mime type image/jpeg). Read libcurl tutorial. You might also want to study the source code of wget

You probably need to fetch the entire content of some URL into a buffer. You'll probably need to keep the filled size of that buffer, and grow it (using malloc, calloc or maybe realloc).

You probably want to fetch the HTML page first, then parse somehow its HTML content and look for <img tags. (You might start using strstr to repeatedly find the <img string; you would also use snprintf to build some strings). Then, parse their src= attribute, and try to compute an URL from it.

Something like

 const char* pagecontent;
 /// retrieve a page content using CURL,
 /// check that its mime type is text/html,
 char* imgtag = NULL;
 for (imgtag = strstr(pagecontent, "<img ");
      imgtag != NULL;
      imgtag = strstr(imgtag+4, "<img "))
   {
     char* srcattr = strstr(imgtag, "src=");
     if (srcattr) {
         /// parse the src just after srcattr+4
         /// build an URL for the image using snprintf
         /// retreive that image using libcurl

Obviously, you need to understand a bit of HTML.

In practice, looking for <img> tags is not fail-proof. Some sites are AJAX mostly and could fetch their images using AJAX requests.

^{(actually I believe that because of AJAX or embedded Javascript finding all images is undecidable, and could probably be proven equivalent to the halting problem)}

If you are newbie in C, don't forget to compile with all warnings and debug info (e.g. gcc -Wall -Wextra -g ....) and learn how to use the debugger (e.g. gdb)

I understand HTML more than programming in C. Everything what you wrote is logical, but I don't know C very well and I don't know how to cut out all URLs with img extends — Virgin, Jun 11 '14 at 12:24
Could you tell me how can I looking for ? We could skip AJAX because that program have to be really easy. — Virgin, Jun 11 '14 at 12:33
I guess that it is easy for you but for me unfortunately not. Could you give me some example. I will be really grateful — Virgin, Jun 11 '14 at 12:41

How to download all images from web page in C?

2 Answers2