16

Portable Network Graphics Overview

The general layout of any given PNG file looks like this:

File Header: An 8-byte signature.

Chunks: Chunks of data ranging from image properties to the actual image itself.


The Problem

I want to read PNG files in C++ without using any external libraries. I want to do this to gain a deeper understanding of both PNG format and the C++ programming language.

I started off using fstream to read images byte-by-byte, but I can't get past the header of any PNG file. I try using read( char*, int ) to put the bytes into char arrays, but read fails on every byte after the header.

As seen above, I think my program always gets caught up on the end-of-file 1A byte. I'm developing on Windows 7 for Windows 7 and Linux machines.


Some of My (Old) Code

#include <iostream>
#include <fstream>
#include <cstring>
#include <cstddef>

const char* INPUT_FILENAME = "image.png";

int main()
{
  std::ifstream file;
  size_t size = 0;

  std::cout << "Attempting to open " << INPUT_FILENAME << std::endl;

  file.open( INPUT_FILENAME, std::ios::in | std::ios::binary | std::ios::ate );
  char* data = 0;

  file.seekg( 0, std::ios::end );
  size = file.tellg();
  std::cout << "File size: " << size << std::endl;
  file.seekg( 0, std::ios::beg );

  data = new char[ size - 8 + 1 ];
  file.seekg( 8 ); // skip the header
  file.read( data, size );
  data[ size ] = '\0';
  std::cout << "Data size: " << std::strlen( data ) << std::endl;
}

The output is always similar to this:

Attempting to open image.png
File size: 1768222
Data size: 0

The file size is correct, but data size is clearly incorrect. Note that I try to skip the header (avoid the end-of-file character) and also account for this when declaring the size of char* data.

Here are some data size values when I modify the file.seekg( ... ); line accordingly:

file.seekg( n );             data size
----------------             ---------
0                            8
1                            7
2                            6
...                          ...
8                            0
9                            0
10                           0

Some of My New Code

#include <iostream>
#include <fstream>
#include <cstring>
#include <cstddef>

const char* INPUT_FILENAME = "image.png";

int main()
{
  std::ifstream file;
  size_t size = 0;

  std::cout << "Attempting to open " << INPUT_FILENAME << std::endl;

  file.open( INPUT_FILENAME, std::ios::in | std::ios::binary | std::ios::ate );
  char* data = 0;

  file.seekg( 0, std::ios::end );
  size = file.tellg();
  std::cout << "File size: " << size << std::endl;
  file.seekg( 0, std::ios::beg );

  data = new char[ size - 8 + 1 ];
  file.seekg( 8 ); // skip the header
  file.read( data, size );
  data[ size ] = '\0';
  std::cout << "Data size: " << ((unsigned long long)file.tellg() - 8) << std::endl;
}

I essentially just modified the Data size: line. A thing to note is the output of the Data size: line is always really close to the maximum value of whatever type I cast file.tellg() to.

user3745189
  • 521
  • 1
  • 7
  • 17
  • 6
    `strlen` stops at the first null terminator, you're assuming the null terminator you add to the end of the buffer is the only one. Usually not a good idea to treat binary data as a text string. – Captain Obvlious Jun 26 '15 at 18:38
  • @CaptainObvlious But how can I get these data size values then? What you said sort of makes sense, but also seems to imply that every value after the header is a null terminator according to these values: file.seekg(0), data size: 8; file.seekg(1), data size: 7; file.seekg(2), data size: 6; ... file.seekg(8), data size: 0; file.seekg(9), data size: 0; file.seekg(10), data size: 0; ... – user3745189 Jun 26 '15 at 18:42
  • I'm not implying that _at all_. The data stored in a PNG should be treated as binary data, this means you should never assume a null terminator and `strlen` is going to be the correct way to go. You need to examine the file format for PNG and start interpreting the data for what it actually is instead of assuming it's just a bunch of strings. – Captain Obvlious Jun 26 '15 at 18:44
  • 2
    @user3745189 `I want to read PNG files in C++ without using anything other than STL` Your code doesn't use anything from STL. If it did, at the very least you would have replace `new[]` with `std::vector` – PaulMcKenzie Jun 26 '15 at 18:48
  • @CaptainObvlious Oh, so like a byte could represent some pixel, but at the same time be a null terminator? Still, I should be able to shove each byte into a char, right? I mean, not every byte will be a null terminator – user3745189 Jun 26 '15 at 18:48
  • @PaulMcKenzie Excuse my lack of understanding about STL. I meant that I don't want to use any external libraries. – user3745189 Jun 26 '15 at 18:49
  • 1
    @user3745189 The image data itself can have null characters. Those null characters in the image data have nothing to do with string termination -- it is just the data that is there. Thus, you don't incorporate using string functions that stop on nulls. – PaulMcKenzie Jun 26 '15 at 18:49
  • @PaulMcKenzie I see. So how can I read the data then? Is there some sort of "bit reader" I can use that isn't molded to deal with strings of characters? – user3745189 Jun 26 '15 at 18:51
  • Be advised that "a byte could represent a pixel" is *not* true for raw PNG file data. If you hoped it would, you need to read the specifications again, and then decide whether you want to implement deflate decompression, or pick a simpler image to start with (TGA, BMP, or PCX). – Jongware Jun 26 '15 at 19:02
  • FYI PNG files will be compressed, I wouldn't re-write zlib though... – mark Jun 26 '15 at 19:22
  • If you do not care about knowing the format, you can use the MFC `CImage` class and its `Load` method – sergiol Dec 06 '16 at 16:09

3 Answers3

9

Your (new) code contains two essential errors:

data = new char[ size - 8 + 1 ];
file.seekg( 8 ); // skip the header
file.read( data, size );  // <-- here
data[ size ] = '\0';      // <-- and here

First off, you want to read the data without the 8 byte prefix, and you allocate the right amount of space (not really, see further). But at that point, size still holds the total amount of bytes of the file, including the 8 byte prefix. Since you ask to read size bytes and there are only size-8 bytes remaining, the file.read operation fails. You don't check for errors and so you do not notice file is invalidated at that point. With an error check you should have seen this:

if (file)
  std::cout << "all characters read successfully.";
else
  std::cout << "error: only " << file.gcount() << " could be read";

Because file is invalid from that point on, all operations such as your later file.tellg() return -1.

The second error is data[size] = '\0'. Your buffer is not that large; it should be data[size-8] = 0;. Currently, you are writing into memory beyond what you allocated earlier, which causes Undefined Behavior and may lead to problems later on.

But that last operation clearly shows you are thinking in terms of character strings. A PNG file is not a string, it is a binary stream of data. Allocating +1 for its size and setting this value to 0 (with the unnecessary "character-wise" way of thinking, with '\0') is only useful if the input file is of a string type – say, a plain text file.

A simple fix for your current issues is this (well, and add error checking for all your file operations):

file.read( data, size-8 );

However, I would strongly advise you to look at a much simpler file format first. The PNG file format is compact and well documented; but it is also versatile, complicated, and contains highly compressed data. For a beginner it is way too hard.

Start with an easier image format. ppm is a deliberately simple format, good to start with. tga, old but easy, introduces you to several more concepts such as bit depths and color mapping. Microsoft's bmp has some nice little caveats but can still be considered 'beginner friendly'. If you are interested in simple compression, the basic Run Length Encoding of a pcx is a good starting point. After mastering that you could look in to the gif format, which uses the much harder LZW compression.

Only if you succeed in implementing parsers for these, you may want to look at PNG again.

Jongware
  • 22,200
  • 8
  • 54
  • 100
1

If you want to know how much data you read from the file then just use tellg() again.

data = new char[ size - 8 + 1 ];
file.seekg( 8 ); // skip the header
file.read( data, size );
data[ size ] = '\0';
if(file.good()) // make sure we had a good read.
    std::cout << "Data size: " << file.tellg() - 8 << std::endl;

There is an error in you code with reading the data as well. You are reading to size where size is the size of the file which is 8 bytes more than you need since you are skipping the header. The correct code is

const char* INPUT_FILENAME = "ban hammer.png";

int main()
{
    std::ifstream file;
    size_t size = 0;

    std::cout << "Attempting to open " << INPUT_FILENAME << std::endl;

    file.open(INPUT_FILENAME, std::ios::in | std::ios::binary);
    char* data = 0;

    file.seekg(0, std::ios::end);
    size = file.tellg();
    std::cout << "File size: " << size << std::endl;
    file.seekg(0, std::ios::beg);

    data = new char[size - 8 + 1];
    file.seekg(8); // skip the header
    file.read(data, size - 8);
    data[size] = '\0';
    std::cout << "Data size: " << file.tellg() << std::endl;
    cin.get();
    return 0;
}
NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • I used slightly modified code ( `((size_t)file.tellg() - 8)` ) instead, and this is the output: `File size: 1768222` and `Data size: 4294967287` which makes me thinking something spooky is happening, since data size is way bigger than file size. It looks like overflow. – user3745189 Jun 26 '15 at 18:55
  • @user3745189: Such a large number makes more sense interpreted as hex. You'll see it's a huge number, or probably (signed) a small negative one ... `-8`. – Jongware Jun 26 '15 at 19:00
  • @user3745189 There was an error in your code as well. I edited my answer to working code. – NathanOliver Jun 26 '15 at 19:13
0

Solution 1:

file.read( data, size );
Size_t data_size = file.tellg() - 8;
std::cout << "Data size: " << data_size << std::endl;

Even easier: Solution 2:

Size_t data_size = file.readsome( data, size );
std::cout << "Data size: " << data_size << std::endl;

file.readsome() returns the number of bytes read.

cdonat
  • 2,748
  • 16
  • 24
  • 1
    I would not suggest using `readsome()` as http://en.cppreference.com/w/cpp/io/basic_istream/readsome says that it might not always do what you want. – NathanOliver Jun 26 '15 at 19:13
  • @NathanOliver. I never had that issue, so I wasn't aware of it. Thanks for pointing it out. You can use file.rdbuf()->sgetn(data, size) instead: http://en.cppreference.com/w/cpp/io/basic_streambuf/sgetn – cdonat Jun 26 '15 at 19:30