Understanding if my code takes care of sign extension

Question

I have a small snippet of code that I have written, I am a bit unsure if sign extension has been properly implemented here

Essentially

I have an array of char* data that contains values in bytes

Now I wish to construct a 16 bit (2 byte) value from this data array

I do this by getting the byte @ address 0 and address 1 and combining them.

This part is working and I am sure of it

Now this is a 16 bit value, which I aim to store in my signed integer array. And obviously integers are 32 bits (4 bytes) thus will need to be padded with an additional 16 bits

Now my question is this

void store_data(unsigned char* data){
    int param1 = 5;
    int param2 = 6;
    int address = array_of_ints[param2] + 50;
    int value = memory[address] | (memory[address+1] << 8);
    int32_t sign  = value;
    array_of_ints[param1] = sign;
}

Has my code been properly sign extending the value from memory

Since I am storing it as a signed 32_int I would not need to pad it with an additional 16 bits right?

If `int` is 32 bits then there is no need to store it in variable `value`, just put it directly into `array_of_ints` it will be cast to `int` all the same. if the data is unsigned then it needs to be within the range of signed values to not overflow and flip the sign. — Motomotes, Apr 02 '23 at 02:07
What's `data` used for? Where does `memory` come from? Or did you mean for these to be the same? — John Bollinger, Apr 02 '23 at 04:21

score 0 · Answer 1 · answered Apr 02 '23 at 02:20

Your answer depends on how memory has been defined. You suggest, in the text, that it is an array of char; chars are typically ( but not definitively ) signed, so no, your code would have some odd values.

That should be relatively easy to test; put values like { 0, 1, 0x7f, 0x80, 0x81 }; at some locations in memory, and observe the output of printing your variables.

One subtle problem is with your shift and or to make a 16bit value. Your expression: memory[address] | (memory[address+1] << 8) by type looks like (signed char) | (signed int). What you want is (unsigned char) | (unsigned short). Most importantly you have to suppress the sign extension from memory[]. You probably want it to be unsigned.

It is much preferred to use <inttypes.h> and the corresponding (int8_t, uint8_t, int16_t, ..., uint64_t) than assuming int or short sizes.

score 0 · Answer 2 · answered Apr 02 '23 at 04:50

You are overthinking it. Generally speaking, you don't need to make special provisions for sign extension. It happens automatically. Assignments in C are value-preserving as long as the value assigned is representable in the type of the destination lvalue.

So, if your int is only 16 bits wide, or if your memory has a signed type, then you have an issue already with

memory[address+1] << 8

The (sub-)expression memory[address+1] is promoted to type int for the operation, and the result has type int already. If you were going to have any issues, then they would manifest already in the computation of this expression.

Taking the name array_of_ints to accurately indicate the types involved, you can just assign the | expression directly to an array element without introducing any temporaries:

    array_of_ints[param1] = memory[address] | (memory[address+1] << 8);

The temporaries don't do anything useful for you.

Since I am storing it as a signed 32_int I would not need to pad it with an additional 16 bits right?

How, exactly, do you imagine you would go about manually padding the result? Perhaps you are confused about the nature of bitwise operations. These are operations on values, not (directly) on object representations. Among the implications is that you are correct: if the computation you're already performing is well defined in the first place, then you don't need to take any extra steps to get the most-significant bits filled with zeroes.

Understanding if my code takes care of sign extension

2 Answers2