You can use libcurl both for textual content (e.g. of mime type text/html
) and for images (e.g. of mime type image/jpeg
). Read libcurl tutorial. You might also want to study the source code of wget
You probably need to fetch the entire content of some URL into a buffer. You'll probably need to keep the filled size of that buffer, and grow it (using malloc
, calloc
or maybe realloc
).
You probably want to fetch the HTML page first, then parse somehow its HTML content and look for <img
tags. (You might start using strstr to repeatedly find the <img
string; you would also use snprintf to build some strings). Then, parse their src=
attribute, and try to compute an URL from it.
Something like
const char* pagecontent;
/// retrieve a page content using CURL,
/// check that its mime type is text/html,
char* imgtag = NULL;
for (imgtag = strstr(pagecontent, "<img ");
imgtag != NULL;
imgtag = strstr(imgtag+4, "<img "))
{
char* srcattr = strstr(imgtag, "src=");
if (srcattr) {
/// parse the src just after srcattr+4
/// build an URL for the image using snprintf
/// retreive that image using libcurl
Obviously, you need to understand a bit of HTML.
In practice, looking for <img>
tags is not fail-proof. Some sites are AJAX mostly and could fetch their images using AJAX requests.
(actually I believe that because of AJAX or embedded Javascript finding all images is undecidable, and could probably be proven equivalent to the halting problem)
If you are newbie in C, don't forget to compile with all warnings and debug info (e.g. gcc -Wall -Wextra -g
....) and learn how to use the debugger (e.g. gdb
)