2

I'm carrying images per URL and showing TImage . I will work with JPG , GIF and PNG . But I do not know how to find out what kind of extension possess each file, and then differentiate. How do I get header , or any other method for me to know what type of file: GIF or PNG or JPG?

var
  MS : TMemoryStream;
  GIf: TGIFImage;
  jpegimg: TJPEGImage;


begin
  MS := TMemoryStream.Create;
  GIf := TGIFImage.Create;
  jpegimg   := TJPEGImage.Create; ///////

  try
      try
        IdHTTP1.get('http://forum.wmonline.com.br/uploads/av-8929.jpg',MS);
        Ms.Seek(0,soFromBeginning);
        //Gif.LoadFromStream(MS);
        //Logo.Picture.Assign(GIF);
        jpegimg.LoadFromStream(MS);
        Logo.Picture.Assign(jpegimg);
      except
        ShowMessage('ERRO');
        Exit;
      end;
  finally
    FreeAndNil(GIF);
    FreeAndNil(MS);
    jpegimg.Free;  ////
  end;
abcd
  • 441
  • 6
  • 24
  • 2
    which file it could be here. you have one choice. `IdHTTP1.get('...../av-8929.jpg',...)` if you do not trust the file extension look at the file header . [Portable_Network_Graphics](https://en.wikipedia.org/wiki/Portable_Network_Graphics#File_header). – moskito-x Aug 10 '15 at 00:13

2 Answers2

10

There are mechanisms intended to allow the description of the content of a request (or response), but any external meta-data may be unreliable, being wholly dependent upon an accurate implementation and setting of the meta-data involved. In some cases that meta-data may be incorrect or entirely missing.

Fortunately in common with many file formats, the specifications for the image file types you mention all mandate a specific header to identify the file (or stream) as conforming (or aspiring to conform) to the relevant specification.

The first 3 bytes of a GIF file are:

`G` `I` `F`    (ASCII)

You may also wish to check the subsequent 3 bytes for a valid GIF version number, also encoded in ASCII:

`8` `9` `a`   or `8` `7` `a`

The first 8 bytes of a PNG file have the values:

137 80 78 71 13 10 26 10   (decimal)

The first 2 bytes of a JPEG file are:

FF D8   (hex)

So to detect the format of the data in a response stream you need only inspect at most the first 8 bytes of the stream for one of these expected header values.

Marco
  • 2,368
  • 6
  • 22
  • 48
Deltics
  • 22,162
  • 2
  • 42
  • 70
  • 1
    I'm wondering why you feel that using the content type is a bad idea – David Heffernan Aug 10 '15 at 12:12
  • 2
    Moreover, I'm finding disturbing what this one got significantly more votes than proper HTTP way. – Free Consulting Aug 10 '15 at 17:10
  • @DavidHeffernan : **OP :** " I want because there are URLs that do not show extensions" . A simple Test. One file 5206.gif same file with another extensions : 5206.jpg and a file without extension what you get with `header := IdHTTP1.Response.ContentType` : `5206.gif` -> **image/gif** :: `5206.jpg` -> **image/jpeg** :: `5206` -> **text/plain** . A proper way to get File type ??? Really ? Look at the `MemoryStream` 10 bytes are enough and you can be shure, what `file type` you have downloaded. – moskito-x Aug 10 '15 at 18:51
  • @moskito Why are extensions relevant to response type? – David Heffernan Aug 10 '15 at 19:09
  • @DavidHeffernan : ask `IdHTTP1.Response.ContentType` . I don't know. – moskito-x Aug 10 '15 at 19:10
  • @moskito Do you know why response type is not reliable? – David Heffernan Aug 10 '15 at 19:12
  • @DavidHeffernan : "I'm wondering why you feel that using the content type is a bad idea" . moskito-x : That's why I feel using `the content type` is a bad idea. (for that kind of code `jpegimg.LoadFromStream(MS);`). – moskito-x Aug 10 '15 at 19:18
  • @moskito-x I've no idea what you are trying to say. Never mind. – David Heffernan Aug 10 '15 at 19:23
  • @DavidHeffernan : **Do you know why response type is not reliable?" moskito-x : "it seems to me you know it and are using it anyway ?" – moskito-x Aug 10 '15 at 19:29
  • 5
    @DavidHeffernan: "I'm wondering why you feel that using the content type is a bad idea" - because it CAN and sometimes IS wrong. Case in point, moskito's example that requesting `5206.jpg` (which is just a copy of `5206.gif` with a different extension) reports `image/jpeg` instead of `image/gif`, and that requesting `5206` without any extension reports `text/plain` instead of `image/gif` or even `application/octet-stream`. So relying on the HTTP `Content-Type` header, although the *preferred* solution, CAN be wrong at times if the server is misconfigured. – Remy Lebeau Aug 10 '15 at 19:35
  • @RemyLebeau This is what I was asking. – David Heffernan Aug 10 '15 at 19:36
  • 2
    @David - Did I say it was a bad idea ? No. I do observe that the content-type is not certain to be reliable. That is a simple fact. Someone/something has to set that external meta-data and may not do so correctly. For context such as text, the content-type can be crucial (to differentiate xml and html, for example), but In the case of these particular file types you can determine the actual type of the content very easily and efficiently from the actual content itself in a way that is certain to be reliable, where **external** meta-data may not be. – Deltics Aug 10 '15 at 19:42
  • 2
    @Free Consulting - I don't understand why you should find it disturbing. The "*proper HTTP way*" is a surprisingly nebulous concept. An incorrectly implemented HTTP server may not provide a reliable content-type, and this is unfortunately not that unusual in my experience. When building a HTTP **server**, how would you determine the correct value for the content-type for an image file you were serving ? The "proper file system way" will not work if the file is named incorrectly. "reliable" trumps "proper". Ideally proper = reliable but we live in the real world, not the ideal. – Deltics Aug 10 '15 at 19:53
  • **Deltics**, you are trying to support system which is broken already. Just as outlined by @moskito-x: eg Apache configured to use filename suffix to present a MIME type. While httpd would behave correctly(!) in such case, file data / file name incoherence would result in incorrect type served. – Free Consulting Aug 11 '15 at 06:10
  • @RemyLebeau, I believe asking for `/5206` (w/o suffix) in vanilla configuration would result in *entity not found*. httpd MIGHT be (mis)configured to **vary** to either `5206.gif` or `5206.png` ofc (while proper workaround SHOULD be to ask `libmagic` about format). – Free Consulting Aug 11 '15 at 06:18
4

I found a way to do what I want because there are URLs that do not show extensions .

Simply extract the image type from the server response:

header := IdHTTP1.Response.ContentType;

image/jpeg = JPG

image/gif = GIF

image/png = PNG

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
abcd
  • 441
  • 6
  • 24
  • @Remy, @abcd :: **OP :** " I want because there are URLs that do not show extensions" . A simple Test. One file 5206.gif same file with another extensions : 5206.jpg and a file without extension what you get with `header := IdHTTP1.Response.ContentType` : `5206.gif` -> **image/gif** :: `5206.jpg` -> **image/jpeg** :: `5206` -> **text/plain** . A proper way to get File type ??? Really ? – moskito-x Aug 10 '15 at 18:48
  • 2
    @moskito-x: It is the server's responsibility to send the correct `Content-Type`. File names are optional. If the server is using the extension of the server-side file to report its type, and there is no extension on the file, then the server should be reporting `application/octet-stream`, `text/plain` is an error. It is likely a default value that the server is falling back on. To compensate for that, you would have to ignore the `Content-Type` and look at the file's actual bytes to check if an image header is present or not. Most image formats are easily identified by their header. – Remy Lebeau Aug 10 '15 at 19:28
  • @RemyLebeau : That's what I said : **Look at the file's actual bytes to check if an image header is present or not**. Thanks for that. – moskito-x Aug 10 '15 at 19:31
  • 2
    @Remy... what !? You mean content-type is not certain to be reliable and you have to check the actual content to determine the actual content type when/in case the server cannot be relied on. Gosh. Who would have thought ? ;) – Deltics Aug 10 '15 at 19:55
  • @Deltics As I read this. Both you and Remy seem to be implying that content type cannot be trusted and should always ignored. If that is the case, I wonder what is the point of content type. – David Heffernan Aug 11 '15 at 11:09