1

I know and i am paranoid that this would be tagged DUPLICATE

However i am stuck at something which i cannot resolve myself so i need your help.

Basically i abstracted concept of reading first 8 bytes from the Image(any) and depending on that decide if it falls under any of types(PNG,JPEG,GIF) .

I am trying to acheive this in Java.

package examples;

import java.io.File;
import java.io.FileInputStream;
import java.io.PrintStream;

import org.apache.commons.io.IOUtils;

public class BlobCheck
{
    public static void main(String args[]) throws Exception
    {
    File dir = new File(args[0]);
    File files[] = dir.listFiles();// Here this files will be changed to
                       // Blobs from database and then i will
                       // convert each blob to bytes.
    StringBuffer sb = new StringBuffer();
    StringBuilder chars = new StringBuilder();
    File afile[];
    int j = (afile = files).length;
    for (int i = 0; i < j; i++)
    {
        File file = afile[i];
        FileInputStream fis = new FileInputStream(file);
        byte bytearr[] = IOUtils.toByteArray(fis);
        long count = 0L;
        byte abyte0[];
        int l = (abyte0 = bytearr).length;
        for (int k = 0; k < l; k++)
        {
        byte b = abyte0[k];
        if (count == 8L)
            break;
        sb.append(b);
        chars.append((char) b);
        count++;
        }

        // if ("-1-40-1-320167470".equals(sb.toString()))
        /*
         * if ("-1-40-1".equals(sb.toString())) System.out.println((new
         * StringBuilder
         * (String.valueOf(file.getName()))).append(" is an image file ")
         * .append
         * (sb.toString()).append(" ").append(chars.toString()).toString());
         * else
         */
        System.out.println((new StringBuilder(String.valueOf(file.getName()))).append(" ").append(sb.toString()));
        sb.delete(0, sb.length());
        chars.delete(0, chars.length());
    }

    }
}

Now,i fill a folder with bunch of different types of files (images,docs,xls,etc..) and excute the class i get the following output.

Here in this,the first 8 byte(decimal) values are different from what has been given in the DUPLICATE (above).Suprisingly most of the images are having same 8 bytes and few are not(highlighted).

Output:

  • 2.jpg -1-40-1-320167470
  • 2g.gif -1-40-1-320167470
  • 324.png -1-40-1-320167470
  • 4.jpg -1-40-1-320167470
  • 6.jpg -1-40-1-320167470
  • 9.jpg -1-40-1-320167470
  • Logo.jpg -1-40-1-1801465100
  • Lpng.png -1-40-1-1801465100
  • picture.xls -48-4917-32-95-7926-31
  • Thumbs.db -48-4917-32-95-7926-31

Please let me know if i have gone wrong somewhere! Thanks.

Community
  • 1
  • 1
chebus
  • 762
  • 1
  • 8
  • 26

2 Answers2

2

I found the problem. Thank you gyan And i feel so stupid about myself already. All i need to do was to change to check the Hex Code of the bytes and not decimals. As given in the http://www.garykessler.net/library/file_sigs.html

The fix is simply -- sb.append(String.format("%02X ", b));

for (int k = 0; k < l; k++)
        {
        byte b = abyte0[k];
        if (count == 8L)
            break;
        //System.out.println(file.getName()+" "+b);
        //sb.append(b);
        sb.append(String.format("%02X ", b));
        //System.out.printf("0x%x ", b);

        count++;
        }

and test as follows

  if(sb.toString().startsWith("FF D8 FF")) 
           System.out.println(file.getName() +" is JPG ");
       else if(sb.toString().startsWith("47 49 46 38 37 61") || sb.toString().startsWith("47 49 46 38 39 61"))
           System.out.println(file.getName() +" is GIF ");
       else if(sb.toString().startsWith("89 50 4E 47 0D 0A 1A 0A"))
           System.out.println(file.getName() +" is PNG ");

Output:

  • 2.jpg is JPG
  • 2g.gif is JPG // type change from JPG to GIF.
  • 324.png is JPG
  • 4.jpg is JPG
  • 6.jpg is JPG
  • 9.jpg is JPG
  • add1.JPG is JPG
  • Logo.jpg is JPG
  • Lpng.png is JPG //type change from JPG to PNG.
  • realGIF.gif is GIF
  • realPNG.png is PNG
chebus
  • 762
  • 1
  • 8
  • 26
1

May be you are getting confused with extension of the filename?

Try this, just change the name of *.png into *.jpeg and open with any image editor/viewer; and it should not complaint about format not being recognized. This could be a reason about why you are getting same 8 bytes, even though the extension is different.

Becuase, What I have observed that many program would not complaint about changing an image file extension, as long as they could process the file and show in their window.

Edit: Please use the below code and post the output:

import java.io.*;
import java.net.*;

public class ReadBytes {
    public static void main( String [] args ) throws IOException {

        URL url = new URL("http://your image url");

            // Read the image ...
        InputStream inputStream      = url.openStream();
        ByteArrayOutputStream output = new ByteArrayOutputStream();
        byte [] buffer               = new byte[ 1024 ];

        int n = 0;
        while (-1 != (n = inputStream.read(buffer))) {
           output.write(buffer, 0, n);
        }
        inputStream.close();

        // Here's the content of the image...
        byte [] data = output.toByteArray();

    // Write it to a file just to compare...
    OutputStream out = new FileOutputStream("data.png");
    out.write( data );
    out.close();

    // Print it to stdout 
        for( byte b : data ) {
            System.out.printf("0x%x ", b);
        }
    }
}
Gyanendra Dwivedi
  • 5,511
  • 2
  • 27
  • 53
  • Thanks gyan for your quick response. Here i am trying to understand two things, First is the link duplicate has some links which says according to PNG,JPEG,GIF the first bytes should always be as given in their site. However i am seeing values which are totally different from them. Secondly, is this approach of judging a blob as an image reliable? or do you have any ideas ? EDIT : I understand that changing the file type will give the same result.But like i said i am confused with the different values (atleast for JPG) from what has been given. Thanks. – chebus Sep 12 '13 at 08:48
  • Please use the updated code just to re-calculate your byte value and lets see the output. – Gyanendra Dwivedi Sep 12 '13 at 09:15