Converting from signed char to unsigned char and back again?

Question

I'm working with JNI and have an array of type jbyte, where jbyte is represented as an signed char i.e. ranging from -128 to 127. The jbytes represent image pixels. For image processing, we usually want pixel components to range from 0 to 255. I therefore want to convert the jbyte value to the range 0 to 255 (i.e. the same range as unsigned char), do some calculations on the value and then store the result as a jbyte again.

How can I do these conversion safely?

I managed to get this code to work, where a pixel value is incremented by 30 but clamped to the value 255, but I don't understand if it's safe or portable:

 #define CLAMP255(v) (v > 255 ? 255 : (v < 0 ? 0 : v))

 jbyte pixel = ...
 pixel = CLAMP_255((unsigned char)pixel + 30);

I'm interested to know how to do this in both C and C++.

You probably ought to add parenthesis when using the arguments to your macro like this: `#define CLAMP255(v) ((v) > 255 ? 255 : ((v) < 0 ? 0 : (v)))` — qbert220, Feb 18 '11 at 12:44

score 139 · Accepted Answer · edited Sep 14 '17 at 02:20

139

This is one of the reasons why C++ introduced the new cast style, which includes static_cast and reinterpret_cast

There's two things you can mean by saying conversion from signed to unsigned, you might mean that you wish the unsigned variable to contain the value of the signed variable modulo the maximum value of your unsigned type + 1. That is if your signed char has a value of -128 then CHAR_MAX+1 is added for a value of 128 and if it has a value of -1, then CHAR_MAX+1 is added for a value of 255, this is what is done by static_cast. On the other hand you might mean to interpret the bit value of the memory referenced by some variable to be interpreted as an unsigned byte, regardless of the signed integer representation used on the system, i.e. if it has bit value 0b10000000 it should evaluate to value 128, and 255 for bit value 0b11111111, this is accomplished with reinterpret_cast.

Now, for the two's complement representation this happens to be exactly the same thing, since -128 is represented as 0b10000000 and -1 is represented as 0b11111111 and likewise for all in between. However other computers (usually older architectures) may use different signed representation such as sign-and-magnitude or ones' complement. In ones' complement the 0b10000000 bitvalue would not be -128, but -127, so a static cast to unsigned char would make this 129, while a reinterpret_cast would make this 128. Additionally in ones' complement the 0b11111111 bitvalue would not be -1, but -0, (yes this value exists in ones' complement,) and would be converted to a value of 0 with a static_cast, but a value of 255 with a reinterpret_cast. Note that in the case of ones' complement the unsigned value of 128 can actually not be represented in a signed char, since it ranges from -127 to 127, due to the -0 value.

I have to say that the vast majority of computers will be using two's complement making the whole issue moot for just about anywhere your code will ever run. You will likely only ever see systems with anything other than two's complement in very old architectures, think '60s timeframe.

The syntax boils down to the following:

signed char x = -100;
unsigned char y;

y = (unsigned char)x;                    // C static
y = *(unsigned char*)(&x);               // C reinterpret
y = static_cast<unsigned char>(x);       // C++ static
y = reinterpret_cast<unsigned char&>(x); // C++ reinterpret

To do this in a nice C++ way with arrays:

jbyte memory_buffer[nr_pixels];
unsigned char* pixels = reinterpret_cast<unsigned char*>(memory_buffer);

or the C way:

unsigned char* pixels = (unsigned char*)memory_buffer;

edited Sep 14 '17 at 02:20

Sobi

117
6

answered Feb 18 '11 at 14:16

wich

16,709
6
47
72

Wow, great post! Right, so static_cast is what I want to use to be safe then. I'm a bit confused by the pointer cast in your sample though. Can I cast a signed char* pointer to a unsigned char* pointer and then read/write with the latter pointer safely? By safe, I mean from that point on I can just treat the latter pointer as if it pointed to an array of unsigned chars? That would make my code much cleaner as it means I don't have to cast back and forth. It seems to work from a quick test but I'm not sure if it's safe. – rbcc Feb 18 '11 at 15:33
2

Yes, given your program's semantics, you can safely cast an array of signed chars to a pointer to unsigned char, with which you effectively say, this memory is not an array of signed chars, but an array of unsigned chars. Note however that this would be a reinterpret_cast, not a static cast, but from the way you describe your problem I think a reinterpret cast is what you want. – wich Feb 18 '11 at 16:04
Added a bit to the answer about arrays – wich Feb 18 '11 at 16:09
any chance you know a nice C++ way with std::vector arrays? I tried this, but it does not really work: `std::vector buffer; std::vector cache = std::vector(reinterpret_cast(buffer.data()), reinterpret_cast(buffer.data() + buffer.size()));` – serup Jan 04 '17 at 07:56
2

@serup why does it not really work? It works fine for me. I would word it slightly differently like so however; `std::vector buffer; unsigned char* ptr = reinterpret_cast(buffer.data()); std::vector cache(ptr, ptr + buffer.size());` Do note that this will always make a copy of the buffer, while the plain array method will not. – wich Jan 04 '17 at 09:14
1

@serup The following does work perfectly fine for me and would avoid the copy of the buffer. I am not 100% sure however if this would be guaranteed to work by the standards. `std::vector buffer; std::vector& cache = reinterpret_cast&>(buffer);` – wich Jan 04 '17 at 09:21
@wich, yes I came to that conclusion myself, however as you said it will make a copy, or will it? - I guess no way around it – serup Jan 04 '17 at 10:28
@wich, what if you add std::move -- will it then create a copy? example: `char* buf = buffer.data(); unsigned char* membuf = reinterpret_cast(buf); std::vector vec(std::move(membuf), std::move(membuf) + buffer.size());` it works, however I am not sure if it actually copies – serup Jan 04 '17 at 10:34
1

@serup it will copy everything, `std::move` will only be useful in cases where the container elements themselves contain pointers to other memory. In that case the pointed to memory is not duplicated, but "moved" instead. For basic types such as `char`, `int`, `float`, etc. it will just be a plain copy. – wich Jan 04 '17 at 16:46

score 2 · Answer 2 · answered Feb 18 '11 at 11:51

2

Yes this is safe.

The c language uses a feature called integer promotion to increase the number of bits in a value before performing calculations. Therefore your CLAMP255 macro will operate at integer (probably 32 bit) precision. The result is assigned to a jbyte, which reduces the integer precision back to 8 bits fit in to the jbyte.

answered Feb 18 '11 at 11:51

qbert220

11,220
4
31
31

Could you comment on what's happening when e.g. the signed char has value -100. I'm confused about what the value gets converted to, what gets converted back and if this is safe. – rbcc Feb 18 '11 at 12:04
You start off with -100, which is 10011100 in binary. Cast that to an unsigned char, and it results in 156. This is the value used for the calculations (add 30 and then test for <0 or >255). You'll end up with 186 (10111011 binary), which is converted back to a signed char, giving a value of -70. This all fits into 8 bit maths anyway. – qbert220 Feb 18 '11 at 12:39
If you started off with -1 (11111111 binary), then cast that to an unsigned char, you get 255. If you add 30 to that then you would get 285. If this was performed in 8 bit maths (i.e. without integer promotion) it would overflow and have the value 29. It would then be in the range 0-255 so would not get clamped. Since we have integer promotion, we have enough precision to represent 285, so the (v > 255) test will be true and the value will be clamped to 255. – qbert220 Feb 18 '11 at 12:39
@rebecca Your code actually never sees the number -100, the expression `(unsigned char)pixel` with pixel at value -100 will already give you a value of 156. – wich Feb 18 '11 at 12:54

score 1 · Answer 3 · answered Feb 18 '11 at 11:55

1

Do you realize, that CLAMP255 returns 0 for v < 0 and 255 for v >= 0?
IMHO, CLAMP255 should be defined as:

#define CLAMP255(v) (v > 255 ? 255 : (v < 0 ? 0 : v))

Difference: If v is not greater than 255 and not less than 0: return v instead of 255

answered Feb 18 '11 at 11:55

Daniel Hilgarth

171,043
40
335
443

Oops, I've updated this. It was a mistake I made when simplifying the code. – rbcc Feb 18 '11 at 11:58

score 0 · Answer 4 · answered Feb 18 '11 at 12:08

0

There are two ways to interpret the input data; either -128 is the lowest value, and 127 is the highest (i.e. true signed data), or 0 is the lowest value, 127 is somewhere in the middle, and the next "higher" number is -128, with -1 being the "highest" value (that is, the most significant bit already got misinterpreted as a sign bit in a two's complement notation.

Assuming you mean the latter, the formally correct way is

signed char in = ...
unsigned char out = (in < 0)?(in + 256):in;

which at least gcc properly recognizes as a no-op.

answered Feb 18 '11 at 12:08

Simon Richter

28,572
1
42
64

Are you saying the casts I'm doing are safe or unsafe then? – rbcc Feb 18 '11 at 12:12
The cast is somewhat unsafe from a C standard lawyer standpoint, but is safe enough on most common systems (machines with 8 bit char and two's complement arithmetic), as I know of no compiler implementation that will do the wrong thing here (although MSVC will generate a runtime warning here if integer conversion overflow checking is enabled). – Simon Richter Feb 18 '11 at 12:20
well, this would go wrong on any architecture that does not use two's complement for signed characters, the simple cast would work regardless of the signed number implementation. – wich Feb 18 '11 at 12:51
@wich: I'm getting confused. Are you saying my sample code with casting is the right way to go about things in safe manner? – rbcc Feb 18 '11 at 13:19
wich: yes and no. The "add 256" strategy is for when the data has already been misinterpreted earlier. – Simon Richter Feb 18 '11 at 14:55

score 0 · Answer 5 · answered Feb 18 '11 at 13:22

I'm not 100% sure that I understand your question, so tell me if I'm wrong.

If I got it right, you are reading jbytes that are technically signed chars, but really pixel values ranging from 0 to 255, and you're wondering how you should handle them without corrupting the values in the process.

Then, you should do the following:

convert jbytes to unsigned char before doing anything else, this will definetly restore the pixel values you are trying to manipulate
use a larger signed integer type, such as int while doing intermediate calculations, this to make sure that over- and underflows can be detected and dealt with (in particular, not casting to a signed type could force to compiler to promote every type to an unsigned type in which case you wouldn't be able to detect underflows later on)
when assigning back to a jbyte, you'll want to clamp your value to the 0-255 range, convert to unsigned char and then convert again to signed char: I'm not certain the first conversion is strictly necessary, but you just can't be wrong if you do both

For example:

inline int fromJByte(jbyte pixel) {
    // cast to unsigned char re-interprets values as 0-255
    // cast to int will make intermediate calculations safer
    return static_cast<int>(static_cast<unsigned char>(pixel));
}

inline jbyte fromInt(int pixel) {
    if(pixel < 0)
        pixel = 0;

    if(pixel > 255)
        pixel = 255;

    return static_cast<jbyte>(static_cast<unsigned char>(pixel));
}

jbyte in = ...
int intermediate = fromJByte(in) + 30;
jbyte out = fromInt(intermediate);

Converting from signed char to unsigned char and back again?

5 Answers5

Linked