5

First, I apologize for any english mistakes I'll make, but being 15 and french doesn't help...

I'm trying to program a PNG decoder with the help of the file format specification (http://www.libpng.org/pub/png/spec/1.2/PNG-Contents.html) but i came across a weird problem.

The specification says that the first eight bytes of a PNG file always contain the following (decimal) values: 137 80 78 71 13 10 26 10.

When I test this simple program :

int main() 
{
    ifstream file("test.png");

    string line;
    getline(file, line);

    cout << line[0] << endl;
}

The output is "ë" which represents 137 in the ascii table. Good, it matches the first byte.

However, when I do int ascii_value = line[0];, the output value is -119, which is not a correct ascii value.

When I try the same thing with another character like "e", it does output the correct ascii value.

Could someone explains what am I doing wrong and what is the solution ? I personally think it's an issue with the extended ascii table, but I'm not sure.

Thank you everybody ! I'll cast my signed char to an unsigned one !

user2018626
  • 83
  • 2
  • 7
  • For a start, the std::string [] operator returns a (reference to a) char and not an int. Depending on what locale you run and what your limits file says -119 is a perfectly valid number as a result there :) – Lieuwe Jan 28 '13 at 15:53
  • your formatting and english isn't bad at all, no need for apologies :) – Samuele Mattiuzzo Jan 28 '13 at 15:53
  • Intentionally or not, the first line is hilarious, given that the English in this post is better than in many (if not most) posts online … – Konrad Rudolph Jan 28 '13 at 15:55
  • In what encoding is ë 137? In Unicode, iso8859-1, and iso8859-15, ë is 235. – aschepler Jan 28 '13 at 15:57

6 Answers6

10

Your system's char type is signed, which is why values thereof can be negative.

You need to be explicit and drop the sign:

const unsigned char value = (unsigned char) line[0];

Note that -119 = 137 in two's complement which your machine seems to be using. So the bits themselves really are correct, it's all about interpreting them properly.

unwind
  • 391,730
  • 64
  • 469
  • 606
5

char in C++ can be both signed or unsigned1), it’s up to the implementation which it is. In the case of your compiler (as in most, actually), it appears to be signed:

Any character value > 128 is represented as a negative number. -119 happens to correspond to the unsigned character value 137. In other words, the following holds:

unsigned char c = 137;
assert(static_cast<signed char>(c) == -119);

But note that this is implementation-specific so you cannot in general rely on these values.


1) And is a distinct type from both signed char and unsigned char.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Nice point about `char` being distinct from both `signed char` and `unsigned char`. – Agentlien Jan 28 '13 at 15:55
  • Yes. The 3 distinct char variants is something that is very easy to overlook, since many textbooks/classes never bother to mention it. I also find it interesting that the C standard also goes out of its way to say the same thing. See C11 6.2.5.14-15. From 6.2.5.15: "The implementation shall define `char` to have the same range, representation, and behavior as either `signed char` or `unsigned char`", then the footnote which reads in part "Irrespective of the choice made, `char` is a separate type from the other two and is not compatible with either." C99 has simlar text. – Kevin Cathcart Jan 28 '13 at 18:45
  • @Kevin I actually only realised this after writing a program which in some context tested for type equality using traits, and getting an error since `char` was neither a `signed char` nor an `unsigned char`. – Konrad Rudolph Jan 28 '13 at 18:54
  • I came across it when tracing an interesting hack that relied on this, and came across a class with a method with 3 overloads, one on `char*`, one on `unsigned char*`, and one on `signed char*`. Needless to say that shook me, since I had been following C++0x development very closely (reading and understanding most working papers) and felt that I knew the language very well. – Kevin Cathcart Jan 28 '13 at 19:14
4

ASCII only covers 0 .. 127. There is no 137 in the ASCII table.

There is no such thing as "the extended ASCII table" either. There are dozens of (mutually incompatible) ASCII extensions. Heck, technically even Unicode is "extended ASCII".

You're getting -119 because in your compiler char is a signed type, covering values from -128 to 127. (-119 is 137 - 256). You can get the value you expect by explicitly casting to unsigned char:

int value = static_cast<unsigned char>(line[0]);
melpomene
  • 84,125
  • 8
  • 85
  • 148
0

That's what happens when you allow sign extension. Characters in the extended ASCII table have their high bit (sign bit) set.

-119 is 0x89. 137 is also 0x89.

Try

int ascii_value = line[0] & 0x00FF;

or

int ascii_value = (unsigned char)line[0];
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
0

137 = -119 = 0x89. If you cast (unsigned) (unsigned char)(line[0]), you'll get the it to print the integer value of 137.

The type char (which is the base type for std::string) is [usually] a signed value, ranging from -128-127. Anythiung higher than 127 will be a negative number.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
0

C++ does not specify whether char is a signed or unsigned type. This means that "extended" ASCII characters (those outside the range 0..127, with their top bit set) might be interpreted as negative values; and it looks like that's what your compiler does.

To get the unsigned value you're expecting, you'll need to explicitly convert it to an unsigned char type:

int ascii_value = static_cast<unsigned char>(line[0]); // Should be 137
Mike Seymour
  • 249,747
  • 28
  • 448
  • 644