2

I need to delete a page from a multipaged TIFF file. I am currently working in .NET but can move to another language if some one knows how to do it in that language.

The page would be either the second to last, or the last page in the file. And I need to do it with out decompressing the previous pages in the file, so not creating a new TIFF and copying all the pages I still want to that.

I have code that does that already, but as the TIFF files I am working with are around 1.0 gb - 3.0 gb heavily compressed, this is extremely time consuming. If I can just remove the part of the file that I want and not create a new one that will go much faster.

The page that I need to remove is very very small compared to the rest of the file, as is the page that may or may not be after it, around 500*500 pixels.

What I have tried, I have tried the LibTiff.Net library, found here

http://bitmiracle.com/libtiff/

After messing with it for awhile I asked the developer about my issue, they said that there wasn't currently support to do that. I also looked into ImageMagick a bit, but I haven't been able to figure out how to do this there either.

Any one got any helpful ideas here?

Bobrovsky
  • 13,789
  • 19
  • 80
  • 130
keepitreall89
  • 1,190
  • 2
  • 14
  • 28
  • 1
    I found a partial answer on StackOverflow when I was searching via google, but I lost the link some where along the way. The question wasn't really related to mine, but the answer suggested that the person use MMAP in python to get individual, specific, bytes of data from large files, then use the struct module to format the bytes to the type you want. I am working on a python module right now that I hope will be able to do this. So far I can get all of the information of the first directory and determine if I want to remove that one or not, but haven't gotten past that. – keepitreall89 Mar 07 '11 at 21:20

1 Answers1

3

Ok, got a solution working in python.

import mmap
from struct import *

def main():
    filename = raw_input("Input file name: ")
    f = open(filename, "r+b")
    offList, compList = getOffsets(f)
    for i in range(len(offList)):
        print "offset: ", offList[i], "\t Compression: ", compList[i]
    print "ran right"
    stripLabelAndMacro(f, offList, 3)
    offList, compList = getOffsets(f)
    for i in range(len(offList)):
        print "offset: ", offList[i], "\t Compression: ", compList[i]
    f.close()
    #test stripping end crap

def getOffsets(f):
    fmap = mmap.mmap(f.fileno(),0)
    offsets = []
    compressions = []
    #get TIFF version
    ver = int(unpack('H', fmap[2:4])[0])
    if ver == 42:
        #get first IDF
        offset = long(unpack('L', fmap[4:8])[0])
        while (offset != 0):
            offsets.append(offset)
            #get number of tags in this IDF
            tags = int(unpack('H', fmap[offset:offset+2])[0])
            i = 0
            while (i<tags):
                tagID = int(unpack('H',fmap[offset+2:offset+4])[0])
                #if the tag is a compression, get the compression SHORT value and
                #if recognized use a string representation
                if tagID == 259:
                    tagValue = int(unpack('H', fmap[offset+10:offset+12])[0])
                    if tagValue == 1:
                        compressions.append("None")
                    elif tagValue == 5:
                        compressions.append("LZW")
                    elif tagValue == 6:
                        compressions.append("JPEG")
                    elif tagValue == 7:
                        compressions.append("JPEG")
                    elif tagValue == 34712 or tagValue == 33003 or tagValue == 33005:
                        compressions.append("JP2K")
                    else:
                        compressions.append("Unknown")
                i+=1
                offset += 12

            offset = long(unpack('L', fmap[offset+2:offset+6])[0])
    return offsets, compressions

#Tested, Doesn't break TIFF
def stripLabel(f, offsetList, labelIndex):
    fmap = mmap.mmap(f.fileno(),0)
    offsetLabel = offsetList[labelIndex]
    offsetMacro = offsetList[labelIndex+1]
    offsetEnd = fmap.size()
    macroSize = offsetEnd - offsetMacro
    for i in range(macroSize):
        fmap[offsetLabel+i] = fmap[offsetMacro+i]
    fmap.flush()
    fmap.resize(offsetLabel+macroSize-1)
    fmap.close()

Tested it, seems to work fine. the stripLabel method is specifically meant to remove the second to last page/directory and shift the last one up, but it should in theory work for any directory other than the last, and it could be easily modified to remove the last too. It requires at least the amount of free ram as the file size you are working on, but it runs fast and file size isn't an issue with most TIFF's. It isn't the most elegant approach, if some one has another please post.

keepitreall89
  • 1,190
  • 2
  • 14
  • 28
  • In the case where you remove the second-to-last page from the image, is the processed image that is being shifted (that is, the last page of the image) still readable after you run your code? Normally, TIFF IFD contains references to starting addresses of pixel data in absolute file offset (that is measured from the beginning of file). By shifting the last image up, those references would be no longer valid. – rwong Apr 02 '11 at 18:57
  • Second question. Depending on the software used to generate the TIFF images, the IFD of a particular page can be located before the pixel data of the page, or after the data (or theoretically speaking, anywhere in the file). Did you test your code with the software which generates your input files? – rwong Apr 02 '11 at 19:00
  • 1
    Second question first. I know that the software that is generating these images puts the pixel data BEFORE the associated IDF. So I'm not removing that data, because I am lazy. But nothing points to it and therefore it doesn't matter to the software we use that reads it. – keepitreall89 Apr 03 '11 at 05:45
  • 1
    The first question. No, it wasn't readable, I was being stupid with that code. I have since changed this code so that it doesn't modify or attempt to remove any directories, instead what it does is just reroutes the pointer that points to the second to last page and points it to the IDF for the last page instead. Effectively whitespacing the second to the last page to our reader. – keepitreall89 Apr 03 '11 at 05:48
  • 1
    And though this wasn't in the original question, I am also in most cases replacing the second to the last page, in that case I make the new page as a tiff file, appended it to the end of the old tiff file, changing any offsets in that new page to still work in the new page, then rerouting some IDF pointers to make it all work. – keepitreall89 Apr 03 '11 at 05:49