12

I'm trying to understand how data are stored into IDAT chunk. I'm writing a little PHP class and I can retrieve most of chunks information but what I get for IDAT doesn't match pixels of my image :

enter image description here It is 2×2px truecolour with alpha (bitdepth 8).

But when I interpret IDAT data like this:

current(unpack('H*',gzuncompress($idat_data)));

I get

00000000ffffff00ffffff000000

I don't understand how it can match pixels. Or is it my code which corrupts data?

Thanks for your help!

EDIT: I get

08d705c101010000008010ff4f1750a93029e405fb

as hex compressed data, so it seems I loss several bytes after uncompressing.

enter image description here

MatTheCat
  • 18,071
  • 6
  • 54
  • 69

3 Answers3

10

Use gzinflate but skip the first 2 bytes and the last 4 first.

$contents = file_get_contents($in_filename);
$pos = 8; // skip header

$color_types = array('Greyscale','unknown','Truecolour','Indexed-color','Greyscale with alpha','unknown','Truecolor with alpha');
$len = strlen($contents);
$safety = 1000;
do {
    list($unused,$chunk_len) = unpack('N', substr($contents,$pos,4));

    $chunk_type = substr($contents,$pos+4,4);

    $chunk_data = substr($contents,$pos+8,$chunk_len);

    list($unused,$chunk_crc) = unpack('N', substr($contents,$pos+8+$chunk_len,4));
    echo "chunk length:$chunk_len(dec) 0x" . sprintf('%08x',$chunk_len) . "h<br>\n";
    echo "chunk crc   :0x" . sprintf('%08x',$chunk_crc) . "h<br>\n";
    echo "chunk type  :$chunk_type<br>\n";
    echo "chunk data  $chunk_type bytes:<br>\n"  . chunk_split(bin2hex($chunk_data)) . "<br>\n";
    switch($chunk_type) {
        case 'IHDR':
        list($unused,$width,$height) = unpack('N2', substr($chunk_data,0,8));
        list($unused,$depth,$Color_type,$Compression_method,$Filter_method,$Interlace_method) = unpack('C*', substr($chunk_data,8));
        echo "Width:$width,Height:$height,depth:$depth,Color_type:$Color_type(" . $color_types[$Color_type] . "),Compression_method:$Compression_method,Filter_method:$Filter_method,Interlace_method:$Interlace_method<br>\n";
        $bytes_per_pixel = $depth / 8;
        break;

        case 'PLTE':
        $palette = array();
        for($i=0;$i<$chunk_len;$i+=3) {
            $tupl = bin2hex(substr($chunk_data,$i,3));
            $palette[] = $tupl;
            if($i && ($i % 30 == 0)) {
                echo "<br>\n";
            }
            echo '<span style="color:' . $tupl . ';">[' . $tupl . ']</span>';
        }
        echo print_r($palette,true) . "<br>";
        break;

        case 'IDAT':
        $compressed = substr($chunk_data,2,$chunk_len - 6); // 2 bytes on the front and 4 at the end
        $decompressed = gzinflate($compressed);
        echo "decompressed chunk data " . strlen($decompressed) . " bytes:<br>\n"  . chunk_split(bin2hex($decompressed),2 + $width * $bytes_per_pixel * 2) . "<br>\n";
        for($row=0; $row<$height; $row++) {
            for($col=1; $col<=$width; $col++) {
                $index = (int)substr($decompressed,((int)$row*($width+1)+$col),1);
                echo '<span style="color:' . $palette[$index] . ';">' . $index . '</span>';
            }
            echo "<br>\n";
        }
        // TODO use filters described here:
        // http://www.w3.org/TR/PNG/#9Filters
        // first byte of scan line is filter type
        break;

    }
    $pos += $chunk_len + 12;
    echo "<hr>";
} while(($pos < $len) && --$safety);
Charlie
  • 1,062
  • 6
  • 9
  • thanks, inflate works now but I get "00000000ffffff00ffffff000000" (14 bytes), how are they used to get pixels? – MatTheCat Sep 03 '11 at 13:45
  • In order to get good compression, the PNG format applies filters before compression. The filters do things like: if two scan lines one-over-the-other are almost the same, the pixels on the lower line that match the pixel above, get changed to zeros. So when you're done you have a crap-ton of zeros and compression is really good. So you need to reverse that and *undo* the filters after decompression. see http://www.w3.org/TR/PNG/#9Filters – Charlie Sep 03 '11 at 13:56
  • 1
    Right, and filter "transforms the byte sequence in a scanline to an equal length sequence of bytes preceded by a filter type byte". So shouldn't I have a 18 bytes uncompressed data (1 "bytedepth" * 4 channels * 4 pixels + 2 filters)? – MatTheCat Sep 03 '11 at 14:05
  • Seems right. Maybe there really isn't an alpha channel??? If the fist byte has some other meaning and there isn't an alpha then you would have 00(the mystery byte) 000000(rgb) ffffff(rgb) + another mystery byte + ffffff and 000000. So perhaps each scan line has a byte to describe the filter scheme of that line. Sorry it's been too long since I did any coding on this. – Charlie Sep 03 '11 at 14:21
  • That's it! My mistake was to think Gimp saved my image with an alpha channel. But image type is 2 for "Truecolour", so it's only 3 channels (R,V,B). Mystery bytes are filter type used for each scanline (none in this case). Thanks! – MatTheCat Sep 03 '11 at 14:32
4
00000000 ffffff00 ffffff00 0000xxxx
black    white    white    black

That's what I can tell (which is correct) ... but you are missing 2 bytes at the end.

Andreas
  • 2,261
  • 1
  • 17
  • 25
  • 2
    I thought there was a filter type byte in each scanline? Could missing bytes come from bad decompression algorithm? – MatTheCat Sep 03 '11 at 12:41
  • I'm not read up on PNG at all really, but the data you present seem to correspond to what you should be getting, except that it isn't all of it ... so I personally can't help you with why that might be happening: http://www.w3.org/TR/PNG/#11IDAT @leonbloy might be right about multiple IDAT blocks, but I find it strange that a block so small would be split up... are you sure you are uncompressing all the bytes? – Andreas Sep 03 '11 at 12:43
  • Thanks but http://www.w3.org/TR/PNG/#4Concepts.EncodingFiltering it seems filter types should be present in data, so more bytes would be missing?? (I know there's only one IDAT chunk in my case ^^) – MatTheCat Sep 03 '11 at 12:47
  • Filter method is part of the IHDR block, not the IDAT block it seems... but I'm unable to decode the description of how the filters actually work though. http://www.w3.org/TR/PNG/#11IHDR – Andreas Sep 03 '11 at 12:53
  • Yep, but it should be a filter *type* in each scanline if I understand well. But it seems I missed something else ; I've edited my question. – MatTheCat Sep 03 '11 at 12:59
  • @MatTheCat as far as I can tell from the little I've now read... isn't it just the same filter type for all of them? – Andreas Sep 03 '11 at 13:02
  • 2
    The first *byte* of each line is `00` here (the PNG line filter). After that, you get two RGB triples: `00 00 00` and `ff ff ff` for the first line, `ff ff ff` and `00 00 00` for the 2nd. – Jongware Oct 02 '14 at 09:32
4

To add to @Andreas (+1) parsing, two things to note:

  1. A PNG file can have (and often has) many IDAT chunks, they must be concatenated to recover the compressed zlib stream. http://www.w3.org/TR/PNG/#10CompressionFSL

  2. Gzip/Compress/Deflate are all related but are not exactly the same. PNG uses deflate/inflate. I'd try with gzdeflate/gzinflate

leonbloy
  • 73,180
  • 20
  • 142
  • 190
  • I tried but I get a data error when using gzinflate =/ (my image has only one IDAT chunk) – MatTheCat Sep 03 '11 at 12:42
  • @MatTheCat I've implemented PNG read/write using Java Deflater/Infalter classes and it worked flawlessly. Perhaps you'd try stripping the first two bytes? `gzinflate(substr($idat_data, 2)` ? http://www.php.net/manual/en/function.gzinflate.php#70875 – leonbloy Sep 03 '11 at 12:50
  • I just noticed that length part of IDAT chunk is smaller than length of its data, I think it's where the problem comes from but I can't guess why – MatTheCat Sep 03 '11 at 12:54
  • @MatTheCat length part of IDAT? I can't seem to find anything on that? – Andreas Sep 03 '11 at 12:59
  • @MatTheCat "The length counts only the data field, not itself, the chunk type, or the CRC." It seems to me as if it's likely that you use it as the length of the entire IDAT block. Just double-checked your hexcode above, I can find the DATA to be 21 bytes. – Andreas Sep 03 '11 at 13:08
  • You're right, so i get true pixels data, so I don't understand what's going wrong -_-' – MatTheCat Sep 03 '11 at 13:11
  • @MatTheCat anyway, I replicated it in PHP and I get the same thing, 2 bytes missing at the end for some reason... and the filter method is NONE. – Andreas Sep 03 '11 at 13:20
  • IHDR tells us filter method is 0, so it should be a filter type for each scanline (http://www.w3.org/TR/PNG/#9Filter-types) – MatTheCat Sep 03 '11 at 13:25