1

I have a really difficult problem for me... I hope that for more experienced programmers in C it's not a problem. I have to create a program (in as easy way as possible) which download all images from some web page. This program must be in C (not C++ or any other language).

I find cURL library to download the source code but I don't have any idea how to download images.

ETID: only need to get all images with .jpg, .gif and .png extensions

Please help me. I am unexperienced programmer so please for a clear answers.

Thank you in advance.

Virgin
  • 33
  • 1
  • 5

2 Answers2

3

You can do like this

#include <stdio.h>
#include <curl/curl.h>
#include <curl/types.h>
#include <curl/easy.h>
#include <string>

size_t write_data(void *ptr, size_t size, size_t nmemb, FILE *stream) {
    size_t written = fwrite(ptr, size, nmemb, stream);
    return written;
}

int main(void) {
    CURL *curl;
    FILE *fp;
    CURLcode res;
    char *url = "http://localhost/image.jpeg";
    char outfilename[] = "saveimage.jpeg";
    curl = curl_easy_init();
    if (curl) {
        fp = fopen(outfilename,"wb");
        curl_easy_setopt(curl, CURLOPT_URL, url);
        curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_data);
        curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp);
        res = curl_easy_perform(curl);
        /* always cleanup */
        curl_easy_cleanup(curl);
        fclose(fp);
    }
    return 0;
}

You can try use wget for download all images.

#include <stdio.h>

int main()
{

     char command[] = "wget -A png,jpeg,jpg,gif -r http://www.freeimages.com/";  
     system(command); 
     return 0;
}
Jayesh Bhoi
  • 24,694
  • 15
  • 58
  • 73
0

You can use libcurl both for textual content (e.g. of mime type text/html) and for images (e.g. of mime type image/jpeg). Read libcurl tutorial. You might also want to study the source code of wget

You probably need to fetch the entire content of some URL into a buffer. You'll probably need to keep the filled size of that buffer, and grow it (using malloc, calloc or maybe realloc).

You probably want to fetch the HTML page first, then parse somehow its HTML content and look for <img tags. (You might start using strstr to repeatedly find the <img string; you would also use snprintf to build some strings). Then, parse their src= attribute, and try to compute an URL from it.

Something like

 const char* pagecontent;
 /// retrieve a page content using CURL,
 /// check that its mime type is text/html,
 char* imgtag = NULL;
 for (imgtag = strstr(pagecontent, "<img ");
      imgtag != NULL;
      imgtag = strstr(imgtag+4, "<img "))
   {
     char* srcattr = strstr(imgtag, "src=");
     if (srcattr) {
         /// parse the src just after srcattr+4
         /// build an URL for the image using snprintf
         /// retreive that image using libcurl

Obviously, you need to understand a bit of HTML.

In practice, looking for <img> tags is not fail-proof. Some sites are AJAX mostly and could fetch their images using AJAX requests.

(actually I believe that because of AJAX or embedded Javascript finding all images is undecidable, and could probably be proven equivalent to the halting problem)

If you are newbie in C, don't forget to compile with all warnings and debug info (e.g. gcc -Wall -Wextra -g ....) and learn how to use the debugger (e.g. gdb)

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547