0

I've been trying to make a program that can extract all the jpeg files in a selected disk image. I know there are 7 JPEG files in the disk image I'm testing it on and yet the code I made can only extract 2 of them. I'd like to ask what I might be doing wrong to be cause this to happen.

#!/usr/bin/python
import sys
from binascii import hexlify

def main(): 
    filename = 'disk.img'
    i = 1
    f = open(filename, 'rb')
    for data in iter(lambda:f.read(4), ""):
            if (data == '\xff\xd8\xff\xe1' or data == '\xff\xd8\xff\xe0'):
                print data.encode('hex')
                print f.tell()
            while(data != '\xff\xd9'):
                new_filename = "%03d.jpg" % i
                newfile = open(new_filename, 'ab')
                newfile.write(data)
                data = f.read(2)
            newfile.close() 
            print "%03d.jpg extracted!" % i             
            i = i+1
            #position = f.tell()

            #f.seek(position+16)


    f.close()
    print "EOF"


if __name__ == '__main__':
    main()

1 Answers1

1

There are existing tools for that. See http://www.cgsecurity.org/wiki/PhotoRec

I suppose the problem with the sample code is that it reads (2|4) bytes at a time and when a JPEG doesn't start at a position which is dividable by (two|four), you won't find it. (two or four depending on the loop we're in)

MrTux
  • 32,350
  • 30
  • 109
  • 146
  • 1
    this does not exactly answer the question – Padraic Cunningham Aug 15 '14 at 14:46
  • Yeah, that's true. However, it's GPLed source code and, thus, one could examine that. Finding JPEGs isn't that easy (why reinvent the wheel?). You don't just have to look for the magic number (d8 ff in the case of JPEG, as you don't know where the image file starts and how long it is), you have to try parsing a JPEG at every position where you find the magic number. – MrTux Aug 15 '14 at 14:50
  • Well maybe the OP is trying to learn from coding, libs are great but don't necessarily teach you how code works. – Padraic Cunningham Aug 15 '14 at 14:51
  • Well, it does. The code seems to check for jpegs at even positions in the image only. – MrTux Aug 15 '14 at 20:38