9

How can i extract image from JPEG-compressed TIFF file ?

I've read bytes according to StripOffests and StripBytesCount fields, but i couldn't load an image from them.

user393679
  • 329
  • 2
  • 3
  • 10

4 Answers4

6

Old style TIFF-JPEG (compression type 6) basically stuffed a normal JFIF file inside of a TIFF wrapper. The newer style TIFF-JPEG (compression type 7) allows the JPEG table data (Huffman, quantization), to be stored in a separate tag (0x015B JPEGTables). This allows you to put strips of JPEG data with SOI/EOI markers in the file without having to repeat the Huffman and Quantization tables. This is probably what you're seeing with your file. The individual strips begin with the sequence FFD8, but are missing the Huffman and quantization tables. This is the way that Photoshop products usually write the files.

BitBank
  • 8,500
  • 3
  • 28
  • 46
  • I tried to add quantization table at the beginning of image data, but it doesn't help. Where must i insert that table ? – user393679 Apr 21 '11 at 22:21
  • How about the Huffman tables? Is the image broken into multiple strips. Do you want 1 JPEG file for each strip? Add more details. – BitBank Apr 22 '11 at 04:34
  • There are 1 JPEG which is broken to multiple strips. – user393679 Apr 22 '11 at 05:28
  • 1
    You didn't answer my question about the Huffman tables. In order to decode JPEG compressed data, the decompressor needs the quantization and Huffman tables. Take a look at a valid jpeg and you will see a series of tables after the metadata (FFEx). – BitBank Apr 22 '11 at 05:35
  • Where can i find the Huffman tales and where i must put them ? – user393679 Apr 22 '11 at 05:48
  • But what iF there is no tag "JPEGTables" & – user393679 Apr 22 '11 at 16:36
  • 1
    If you have a TIFF file with type 7 JPEG data and the strips of compressed data are missing the huffman and quantization tables, then it must be stored in the JPEGTables tag. You should be able to take the info in the JPEGTables tag in insert it into the image data (accounting for SOI/EOI) and it should work. – BitBank Apr 22 '11 at 16:48
  • I've extracted an image from my Tiff file, but it looks absolutely different from original. A pink strip. What is the problem with that ? – user393679 Apr 24 '11 at 09:40
  • I found that an image is cut. Height value in FFC0 segment is wrong. – user393679 Apr 24 '11 at 10:30
  • send me the file and I'll take a look. (bitbank (at) pobox (dot) com). – BitBank Apr 24 '11 at 17:11
3

Using JAI:

int TAG_COMPRESSION             = 259;
int TAG_JPEG_INTERCHANGE_FORMAT = 513;

int COMP_JPEG_OLD  = 6;
int COMP_JPEG_TTN2 = 7;

SeekableStream stream = new ByteArraySeekableStream(imageData);
TIFFDirectory    tdir = new TIFFDirectory(stream, 0);
int       compression = tdir.getField(TAG_COMPRESSION).getAsInt(0);

// Decoder name
String decoder2use = "tiff";
if (compression == COMP_JPEG_OLD) { 
    // Special handling for old/unsupported JPEG-in-TIFF format:
    // {@link: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4929147 }
    stream.seek(tdir.getField(TAG_JPEG_INTERCHANGE_FORMAT).getAsLong(0));
    decoder2use = "jpeg";
}

// Decode image
ImageDecoder  dec = ImageCodec.createImageDecoder(decoder2use, stream, null);
RenderedImage img = dec.decodeAsRenderedImage();

Great solution , helped me a lot . Just to add , if you have multiple pages in TIFF you have to repeat reading the stream with defining a different directory number in TIFFDirectory object and repeat all of the above.

TIFFDirectory tdir = new TIFFDirectory(stream, 1);
Alexander Pavlov
  • 31,598
  • 5
  • 67
  • 93
Dmitry Mitskevich
  • 4,946
  • 2
  • 15
  • 8
2

The problem with the mentioned library libtiff is that it does extract the image and then saves it recompressed which means another quality loss in case of jpg. That said I can accomplish the same without even using a 3rd party lib by just calling GDI+ methods of NET Framework.

The initial author of this thread tries to get the jpeg binary without having to recompress it and that is exactly what I am trying to do as well.

This is a possible solution if you can live with quality loss and do not want to use anythin but .NET library classes:

    public static int SplitMultiPage(string sourceFileName, string targetPath)
    {
        using (Image multipageTIFF = Image.FromFile(sourceFileName))
        { 
            int pageCount = multipageTIFF.GetFrameCount(FrameDimension.Page);

            if (pageCount > 1)
            {
                string sFileName = Path.GetFileNameWithoutExtension (sourceFileName);
                for (int i = 0; i < pageCount; i++)
                {                        
                    multipageTIFF.SelectActiveFrame(FrameDimension.Page, i);

                    // ein einzelner Frame könnte auch ein anderes Format haben, z.B. JPG, PNG, BMP, etc.
                    // Damit die Datei die korrekte Endung bekommt, holen wir uns eine Endung aus der Beschreibung des Codecs
                    // Interessanterweise liefert uns das RawFormat im Fall TIFF (der einzige Multiframefall) immer den Codec für TIFF, 
                    // statt den des Frames
                    ImageCodecInfo codec = Helpers.GetEncoder(multipageTIFF.RawFormat);
                    string sExtension = codec.FilenameExtension.Split(new char[] { ';' })[0];
                    sExtension = sExtension.Substring(sExtension.IndexOf('.') + 1);
                    string newFileName = Path.Combine(targetPath, string.Format("{0}_{1}.{2}", sFileName, i + 1, sExtension));

                    EncoderParameters encoderParams = new EncoderParameters(2);
                    encoderParams.Param[0] = new EncoderParameter(System.Drawing.Imaging.Encoder.SaveFlag, (long)EncoderValue.LastFrame);

                    // für TIF 1 Bit machen wir CompressionCCITT4 Kompression, da das die besten Ergebnisse liefert
                    switch (GetCompressionType(multipageTIFF))
                    {

                        case 1: // No compression -> BMP?
                            encoderParams.Param[1] = new EncoderParameter(System.Drawing.Imaging.Encoder.Compression, (long)EncoderValue.CompressionNone);
                            break;
                        case 2: // CCITT modified Huffman RLE 32773 = PackBits compression, aka Macintosh RLE
                            encoderParams.Param[1] = new EncoderParameter(System.Drawing.Imaging.Encoder.Compression, (long)EncoderValue.CompressionRle);
                            break;
                        case 3: // CCITT Group 3 fax encoding
                            encoderParams.Param[1] = new EncoderParameter(System.Drawing.Imaging.Encoder.Compression, (long)EncoderValue.CompressionCCITT3);
                            break;
                        case 4: // CCITT Group 4 fax encoding
                            encoderParams.Param[1] = new EncoderParameter(System.Drawing.Imaging.Encoder.Compression, (long)EncoderValue.CompressionCCITT4);
                            break;
                        case 5: // LZW
                            encoderParams.Param[1] = new EncoderParameter(System.Drawing.Imaging.Encoder.Compression, (long)EncoderValue.CompressionLZW);
                            break;
                        case 6: //JPEG ('old-style' JPEG, later overriden in Technote2)
                        case 7: // Technote2 overrides old-style JPEG compression, and defines 7 = JPEG ('new-style' JPEG)
                            {
                                codec = Helpers.GetEncoder(ImageFormat.Jpeg);
                                encoderParams.Param[1] = new EncoderParameter(System.Drawing.Imaging.Encoder.Quality, 90);
                            }
                            break;
                    }

                    multipageTIFF.Save(newFileName, codec, encoderParams);
                }
            }

            return pageCount;
        }
    }

the used helper method:

    public static ImageCodecInfo GetEncoder(ImageFormat format)
    {

        ImageCodecInfo[] codecs = ImageCodecInfo.GetImageDecoders();

        foreach (ImageCodecInfo codec in codecs)
        {
            if (codec.FormatID == format.Guid)
            {
                return codec;
            }
        }
        return null;
    }

Reading the compression flag:

    public static int GetCompressionType(Image image)
    {
        /*  TIFF Tag Compression
            IFD     Image
            Code        259 (hex 0x0103)
            Name        Compression
            LibTiff name        TIFFTAG_COMPRESSION
            Type        SHORT
            Count       1
            Default     1 (No compression)
            Description

            Compression scheme used on the image data.

            The specification defines these values to be baseline:

            1 = No compression
            2 = CCITT modified Huffman RLE
            32773 = PackBits compression, aka Macintosh RLE

            Additionally, the specification defines these values as part of the TIFF extensions:

            3 = CCITT Group 3 fax encoding
            4 = CCITT Group 4 fax encoding
            5 = LZW
            6 = JPEG ('old-style' JPEG, later overriden in Technote2)

            Technote2 overrides old-style JPEG compression, and defines:

            7 = JPEG ('new-style' JPEG)

            Adobe later added the deflate compression scheme:

            8 = Deflate ('Adobe-style')

            The TIFF-F specification (RFC 2301) defines:

            9 = Defined by TIFF-F and TIFF-FX standard (RFC 2301) as ITU-T Rec. T.82 coding, using ITU-T Rec. T.85 (which boils down to JBIG on black and white).
            10 = Defined by TIFF-F and TIFF-FX standard (RFC 2301) as ITU-T Rec. T.82 coding, using ITU-T Rec. T.43 (which boils down to JBIG on color). 
        */
        int compressionTagIndex = Array.IndexOf(image.PropertyIdList, 0x103);

        PropertyItem compressionTag = image.PropertyItems[compressionTagIndex];

        return BitConverter.ToInt16(compressionTag.Value, 0);
    }
  • This method: public static int GetCompressionType(Image image) should end with "return BitConverter.ToUInt16(compressionTag.Value, 0);" as you do not want a signed integer due to the 16 bit array that is returned. – Dan Waterbly Mar 08 '13 at 18:36
0

If you are trying to extract the actual image from a TIFF, JPEG or otherwise, you are best off using a library such as libtiff in order to do so. TIFF is a very complicated spec and while you might be able to do this yourself and get one or two classes of images, chances are you wouldn't be able to handle the other cases that arise frequently, especially "old-style" JPEG which is a sub-format that was foisted upon TIFF and doesn't fit well into the overall.

My company, Atalasoft, makes a .NET product that includes a very good codec for TIFF. If you only need to worry about single page images, our free product will work just fine for you.

In the .NET realm, you could also look at Bit Miracle's managed version of libtiff. It is a pretty decent port of the library.

plinth
  • 48,267
  • 11
  • 78
  • 120
  • Thank you for the answer. I cannot use any libraries. I'm interesting in parsing only new-style JPEG-compressed TIFF's. Could you please tell me, what is wrong with JPEG-compressed data i've retrieved from TIFF file, according to the StripOffests and StripBytesCount fields ? – user393679 Apr 21 '11 at 13:17
  • I see it contains the beginning marker of jpeg: X0FFD8, but somewhy it cannot be opened. What manipulations i must make with this data to get an image ? – user393679 Apr 21 '11 at 13:25