C function replace a byte in a specific index of a parameter

Question

So I'm using the following code:

unsigned long replaceByte(unsigned long original,unsigned char newByte,int indexToReplace)
{
    int shift = 8 * indexToReplace;
        unsigned long value = newByte << shift;
        unsigned long mask = 0xff << shift;

        return (~mask & original) | value;
}

I have a given word with |w| bytes.

Bytes are numbered from 0(least significant) to w/8-1(most significant).

For example:

replaceByte(unsigned long original, unsigned char newByte, int indexToReplace)
correct answer:
replaceByte(0x12345678CDEF3456, 0xAB, 2) --> 0x1234AB78CDEF3456
                       (my code's output is: 0x12345678CDAB3456)
correct answer:
replaceByte(0x12345678CDEF3456, 0xAB, 0) --> 0xAB345678CDEF3456
                       (my code's output is: 0x12345678cdef34AB)

I thought I need to check whether the system is a big endian or a little endian because my code changes the exact opposite bytes. Lets say for example it changes the MSB instead of the LSB. But... I realized that it does not matter because i'm working with bits.

As you can see, the code here changes the wrong index:

(!) Error in index: 0. Output: 0x123456789abcdeff
Answer: 0xff3456789abcdeab

 (!) Error in index: 1. Output: 0x123456789abcffab
Answer: 0x12ff56789abcdeab

(!) Error in index: 2. Output: 0x123456789affdeab
Answer: 0x1234ff789abcdeab

 (!) Error in index: 3. Output: 0xffffffffffbcdeab
Answer: 0x123456ff9abcdeab

 (!) Error in index: 4. Output: 0x123456789abcdeff
 Answer: 0x12345678ffbcdeab

(!) Error in index: 5. Output: 0x123456789abcffab
 Answer: 0x123456789affdeab

 (!) Error in index: 6. Output: 0x123456789affdeab
 Answer: 0x123456789abcffab

Well, I thought about changing my code to something with arrays, just to get a number -> run on it as an array -> change the needed index -> and that's it. But.. I couldn't write it correctly so I stick to the shifting thing (which I can't write correctly as well). This is my attempt:

    unsigned long replaceByte(unsigned long original, unsigned char newByte, int indexToReplace){
    int size = (sizeof(unsigned long));
char a[size];
for (int i=0; i<size; i++){
if (i=0)
a[0] = original & 0xff;
else
a[i] = original>>(8*i) & 0xff;
}
a[indexToReplace] = newByte;
......// stuck
 }

I'm not allowed to use long long, uint_fast64_t or reinterpret_cast or any other "externals" things.

I also think I need to change somehow if the code is running on a 32 bit system or a 64 one in order to determine which size is the unsigned long (4 or 8 bytes).

You should remove one of your less necessary tags and tag your question with your programming language. That will ensure it gets seen by the greatest number of users who might be able to answer it. — Nick, Nov 20 '20 at 21:34
To start, `mask` and `value` are `int`. They truncate the values and need to be `unsigned long` (the same type as `original `). — Craig Estey, Nov 20 '20 at 22:15
Works on 64 bit system because `long` is 64 bit. For 32 bit system use `unsigned long long`. That works for both 32 and 64 — Craig Estey, Nov 20 '20 at 22:21
@CraigEstey Thanks I fixed the int to unsigned long thing, although it did not fix or change anything. Still gives the same wrong output. and I can't use long long. (I'm not allowed to change the function signature). — Dima Ciun, Nov 20 '20 at 22:25
Try `0xffL` instead of `0xff`. The latter will do a 32 bit shift and truncate due to promotion rules. The fix forces 64 bit — Craig Estey, Nov 20 '20 at 22:34
@CraigEstey nope.. I think it's related to the big/little endian but I can't figure it out — Dima Ciun, Nov 20 '20 at 22:40
`replaceByte(0x12345678CDEF3456, 0xAB, 2) --> 0x1234AB78CDEF3456` Huh? You have the correct answer wrong. You said bytes are numbered such that zero is the least significant byte. How does 2 get you to here 0x1234*AB*78CDEF3456? — David Schwartz, Nov 20 '20 at 23:18
@DavidSchwartz that's because the answers are given by little endian machine imo, which store first the last byte. — Dima Ciun, Nov 20 '20 at 23:29
@DimaCiun No. That is not correct. Which is the last significant byte is endianness independent. Whether you store the decimal number one hundred and twenty three as "1 2 3" or "3 2 1", the most significant digit is the 1 and the least significant digit is the 3. *Where* you store the least significant digit doesn't change which digit is least significant. Endianness only affects the order in which bytes are stored in memory, it doesn't change the value of the most significant byte as that is a property of the *value*. — David Schwartz, Nov 21 '20 at 04:39
@DimaCiun endianness is **completely irrelevant** here because it's how values are stored in memory. Since you don't have any pointers and just operate on values, the endian doesn't affect the result — phuclv, Nov 21 '20 at 06:45

Craig Estey · Accepted Answer · 2020-11-21T01:06:27.840

This is prefaced by [my] top comments.

value and mask need to be unsigned long.

Also, when doing the shift, both values are/were getting truncated [to 32 bit] due to expression promotion rules.

In the above, I forgot about value having the same problem.

Here's an alternate way to force correct shifting:

unsigned long
replaceByte(unsigned long original,unsigned char newByte,int indexToReplace)
{
    int shift = indexToReplace*8;
    unsigned long value = newByte;
    unsigned long mask = 0xff;

    value <<= shift;
    mask <<= shift;

    return (~mask & original) | value;
}

The above is what I usually do. But, the following may also work:

unsigned long
replaceByte(unsigned long original,unsigned char newByte,int indexToReplace)
{
    int shift = indexToReplace*8;
    unsigned long value = ((unsigned long) newByte) >> shift;
    unsigned long mask = ((unsigned long) 0xff) >> shift;

    return (~mask & original) | value;
}

UPDATE:

hey thanks. The provided codes bring me the following output: 0x12345678cdef34AB instead of 0xAB345678CDEF3456. I'm pretty sure it's related to the little endian thing because it's not a coincidence that instead of the MSB the LSB gets replaced.

It's not an endian thing. It's how indexToReplace needs to be interpreted.

The processor fetches according to the endian mode in effect, so by the time we try to do the shift, the value in the processor register is always big endian [so, no worries]

The normal/usual is that the index starts from the right. But, according to the [correct] data, the problem wants the index to be from the left.

So, we just need to adjust the index/shift:

unsigned long
replaceByte(unsigned long original,unsigned char newByte,int indexToReplace)
{
#if 0
    int shift = indexToReplace * 8;
#else
    int shift = ((sizeof(unsigned long) - 1) - indexToReplace) * 8;
#endif
    unsigned long value = newByte;
    unsigned long mask = 0xff;

    value <<= shift;
    mask <<= shift;

    return (~mask & original) | value;
}

UPDATE #2:

It recognises the "int shift = indexToReplace * 8;" as a comment for some reason, but it still works.

That is because #if 0 is a CPP [preprocessor] statement. It is interpreted in a manner similar to #ifdef NEVERWAS where we never do a #define NEVERWAS, so the code under the #else is what is included.

You may wish to use the -E and/or -P options when compiling to see the output of the preprocessor stage.

In this instance, the only thing that the compiler will see is:

int shift = ((sizeof(unsigned long) - 1) - indexToReplace) * 8;

BUT if I try to change the "#if 0" to "#if (is_big_endian == 0)" I get a wrong result when I use "0" as the indexToReplace.

Please try to get beyond referring to this as endian related. Once again, that is not what is happening. The code I've posted works regardless of the processor endian mode.

Please reread the part about the correct/proper interpretation of the byte index. It is how one choses to number the bytes.

Once again, 99.44% of the time, it is oriented from the right (LSB to MSB). Graphically, most people use:

| MSB |     |     |     |     |     |     | LSB |
|  01 |  23 |  45 |  67 |  89 |  AB |  CD |  EF | DATA
|   7 |   6 |   5 |   4 |   3 |   2 |   1 |   0 | INDEX

However, for your exact problem statement, it is oriented from the left (MSB to LSB):

| MSB |     |     |     |     |     |     | LSB |
|  01 |  23 |  45 |  67 |  89 |  AB |  CD |  EF | DATA
|   0 |   1 |   2 |   3 |   4 |   5 |   6 |   7 | INDEX

This is unusual. It is also slower because the calculation of the shift is more complex.

It gives out: 0x12345678CDEF34FF instead of 0xFF345678CDEF3456

Ultimately, whatever you did to the #if, it chose the incorrect equation.

hey thanks. The provided codes bring me the following output: 0x12345678cdef34AB instead of 0xAB345678CDEF3456 for this input: (0x12345678CDEF3456, 0xAB, 0). I'm pretty sure it's related to the little endian thing because it's not a coincidence that instead of the MSB the LSB gets replaced. — Dima Ciun, Nov 20 '20 at 23:05
Wםw! Thanks alot Craig, it works! What does "if 0" checks? my is_big_endian function? So basically you already implanted here the condition that checks whether it's a little or a big endian machine. If it's little endian : shift = indexToReplace * 8, and if it's a big endian so the minus 1 minus indexToRplace. Did I get it correctly? — Dima Ciun, Nov 20 '20 at 23:44
It recognises the "int shift = indexToReplace * 8;" as a comment for some reason, but it still works. BUT if I try to change the "#if 0" to "#if (is_big_endian == 0)" I get a wrong result when I use "0" as the indexToReplace. It gives out: 0x12345678CDEF34FF instead of 0xFF345678CDEF3456. — Dima Ciun, Nov 20 '20 at 23:50
Amazing, I've learnt a lot from your answer, and I appreciate that you invested so much time, thank you! Last question, you told me that "The processor fetches according to the endian mode in effect, so by the time we try to do the shift, the value in the processor register is always big endian". Which says, and correct me if I'm wrong, that there IS an endian thing here although you tell me to forget about the endianness thing. You just told me it always treats it as a big endian, or I'm missing out something? — Dima Ciun, Nov 21 '20 at 10:31
Endian _only_ refers to the stored byte order in _memory_. Since you are dealing with `unsigned long` you can _not_ know the endian of the memory. See: https://en.wikipedia.org/wiki/Endianness and the diagram in the Example section in particular. — Craig Estey, Nov 21 '20 at 16:46

C function replace a byte in a specific index of a parameter

1 Answers1