Bitshifting vs array indexing, which is more appropriate for usart interfaces on 32bit MCUs

Question

I have an embedded project with a USART HAL. This USART can only transmit or receive 8 or 16 bits at a time (depending on the usart register I chose i.e. single/double in/out). Since it's a 32-bit MCU, I figured I might as well pass around 32-bit fields as (from what I have been lead to understand) this is a more efficient use of bits for the MPU. Same would apply for a 64-bit MPU i.e. pass around 64-bit integers. Perhaps that is misguided advice, or advice taken out of context.

With that in mind, I have packed the 8 bits into a 32-bit field via bit-shifting. I do this for both tx and rx on the usart.

The code for the 8-bit only register is as follows (the 16-bit register just has half the amount of rounds for bit-shifting):

int zg_usartTxdataWrite(USART_data*         MPI_buffer,
                        USART_frameconf*    MPI_config,
                        USART_error*        MPI_error)
{

MPI_error = NULL;

if(MPI_config != NULL){
    zg_usartFrameConfWrite(MPI_config);
}

HPI_usart_data.txdata = MPI_buffer->txdata;

    for (int i = 0; i < USART_TXDATA_LOOP; i++){
        if((USART_STATUS_TXC & usart->STATUS) > 0){
            usart->TXDATAX = (i == 0 ? (HPI_usart_data.txdata & USART_TXDATA_DATABITS) : (HPI_usart_data.txdata >> SINGLE_BYTE_SHIFT) & USART_TXDATA_DATABITS);
        }
        usart->IFC |= USART_STATUS_TXC;
    }
    return 0;
}

EDIT: RE-ENTERTING LOGIC OF ABOVE CODE WITH ADDED DEFINES FOR CLARITY OF TERNARY OPERATOR IMPLICIT PROMOTION PROBLEM DISCUSSED IN COMMENTS SECTION

(the HPI_usart and USART_data structs are the same just different levels, I have since removed the HPI_usart layer, but for the sake of this example I will leave it in)

#define USART_TXDATA_LOOP 4
#define SINGLE_BYTE_SHIFT 8

typedef struct HPI_USART_DATA{

   ...
   uint32_t txdata;
   ...

}HPI_usart

HPI_usart HPI_usart_data = {'\0'};

const uint8_t USART_TXDATA_DATABITS = 0xFF;

int zg_usartTxdataWrite(USART_data*         MPI_buffer,
                        USART_frameconf*    MPI_config,
                        USART_error*        MPI_error)
{

MPI_error = NULL;

if(MPI_config != NULL){
    zg_usartFrameConfWrite(MPI_config);
}

HPI_usart_data.txdata = MPI_buffer->txdata;

    for (int i = 0; i < USART_TXDATA_LOOP; i++){
        if((USART_STATUS_TXC & usart->STATUS) > 0){
            usart->TXDATAX = (i == 0 ? (HPI_usart_data.txdata & USART_TXDATA_DATABITS) : (HPI_usart_data.txdata >> SINGLE_BYTE_SHIFT) & USART_TXDATA_DATABITS);
        }
        usart->IFC |= USART_STATUS_TXC;
    }
    return 0;
}

However, I now realize that this is potentially causing more issues than it solves because I am essentially internally encoding these bits which then have to be decoded almost immediately when they are passed through to/from different data layers. I feel like it's a clever and sexy solution, but I'm now trying to solve a problem that I shouldn't have created in the first place. Like how to extract variable bit fields when there is an offset i.e. in gps nmea sentences where the first 8 bits might be one relevant field and then the rest are 32bit fields. So it ends up being like this:

32-bit array member 0:

 bits 24-31      bits 15-23          bits 8-15            bits 0-7

32-bit array member 1:

 bits 24-31             bits 15-23           bits 8-15               bits 0-7

32-bit array member 2:

 bits 24-31        15-23 8-15 ...

| 32-bit Value B, bits 0-7 | etc... | .... | .... |

The above example requires manual decoding, which is fine I guess, but it's different for every nmea sentence and just feels more manual than programmatic.

My question is this: bitshifting vs array indexing, which is more appropriate?

Should I just have assigned each incoming/outgoing value to a 32-bit array member and then just index that way? I feel like that is the solution since it would not only make it easier to traverse the data on other layers, but I would be able to eliminate all this bit-shifting logic and then the only difference between an rx or tx function would be the direction the data is going.

It does mean a small rewrite of the interface and the resulting gps module layer, but that feels like less work and also a cheap lesson early on in my project.

Also any thoughts and general experience on this would be great.

tl;dr. If there is an array and you only want to access values on 8 bit boundaries indexing a `char`-array (or a `char*` aliasing that thingy you want to access) is always more appropriate and easier to read than bit-shifting. — Swordfish, Feb 20 '19 at 01:49
Awesome. Thanks for the sanity check. I think I got tunnel-vision on this... bad brain... BAD... go sit in the corner and stop overthinking!! — BitShift, Feb 20 '19 at 01:51
take a rubber duck with you and have a great time in the corner then ;p — Swordfish, Feb 20 '19 at 01:53
Don't write code such as `usart->TXDATAX = (i == 0 ? (HPI_usart_data.txdata & USART_TXDATA_DATABITS) : (HPI_usart_data.txdata >> SINGLE_BYTE_SHIFT) & USART_TXDATA_DATABITS);`. This is dangerous, unreadable and needs to be split in several expressions. — Lundin, Feb 20 '19 at 07:27
@Swordfish A `char` array is entirely unsuitable to use for anything but strings. It should never be used for de-serializing a larger chunk of data. If attempted, the programmer will end up in implicit type promotion hell, when mixing bitwise operators with potentially signed `char`. — Lundin, Feb 20 '19 at 07:40
@Lundin I have used uintN_t datatypes throughout my code. Also what do you mean by this is dangerous? It's a simple ternary operator with some bit-shifting. Everything is in it's own set of parenthesis to avoid any confusing precedence. — BitShift, Feb 20 '19 at 09:20
@Medicineman25 Lundin is talking about integer promotion. There is a problem with `uint8_t`, though, it isn't allowed to alias other types unlike `char` and `char unsigned`. — Swordfish, Feb 20 '19 at 12:01
@Swordfish oh yeh I got the part about integer promotion, hence why I don't use those datatypes... was just confused about his comment that the ternary operator was dangerous. They aren't related are they? — BitShift, Feb 20 '19 at 13:30
@Medicineman25 in my eyes that's no argument to not use `char unsigned`. you just have to know what you're doing. but that is true for all aspects of C. i don't know why the ternary operator should be considered "dangerous". — Swordfish, Feb 20 '19 at 13:34
Lundin's comment about "dangerous" is that the line is difficult to decode at glance. By smushing everything into a single line, you're not optimizing the code in any way (any even half-decent compiler will generate the same assembly for your ternary versus a full up if/else), but you're making all future maintainers of the code step carefully through the line, as there are complex operations in both the if and the else. — Ross, Feb 20 '19 at 14:17
I would disagree strongly with that assertion. Perhaps I could have improved the layout of each statement after every OR, perhaps as a new line and align the OR operators vertically, but the inability to read a simple ternary is no cause for alarm other than identifying an opportunity to teach the person who is unable to read the statement. Anyway I don't wanna start a flame-war, just offering a counter-opinion. Thank you for yours @Lundin :) — BitShift, Feb 21 '19 at 01:21
@Swordfish Of course it is allowed to alias character types or the whole C standard falls apart. And that's usually how `uint8_t` is implemented, as a typedef to `unsigned char`. — Lundin, Feb 21 '19 at 07:42
@Medicineman25 My comment is mostly about "most operators on a single line wins a price". When you write a mess like that line, you have multiple implicit promotions taking place. Something around 8 of them. You took all those in account when you wrote that line? No? So bluntly put, you write code which you don't know what it does - buggy and dangerous. And yes, the ?: operator adds a little bit to that, by balancing the 2nd and 3rd operands against each other regardless of which one that is evaluated. — Lundin, Feb 21 '19 at 07:47
For example if `HPI_usart_data.txdata` is of type `char`, `uint8_t`, `int` or `short`, you have a big portability bug sitting right there, getting wildly different results depending on if right shift is arithmetic or logical for the given system. — Lundin, Feb 21 '19 at 07:52
@Lundin That's not what i said. I said `uintX_t` is not allowed to alias OTHER types. And if `uint8_t` is `char unsigned` or `int8_t` is `char` is not guaranteed and thus implementation-defined and not portable. duh. In general `uint8_t` and `int8_t` are NOT allowed to alias other types. — Swordfish, Feb 21 '19 at 07:55
@Lundin Ah I see... I'm not sure what implicit promotion you are referring to there, I would love to learn more if you could elaborate. For clarity, txdata is of type uint32_t, USART_TXDATA_DATABITS is a const uint8_t assigned the value 0xFF and SINGLE_BYTE_SHIFT is just a preproc define of 8. — BitShift, Feb 23 '19 at 00:24
The way I read the logic is this: "does i equal 0? if not, bitwise AND the 32bit member txdata, with an 8 bit constant, thus extracting 8 bits. else if i > 0, shift the 32bit value to the right 8 bits in each loop iteration and cut off the next 8 bits. Shift the whole value to the right and tack the new 8 bits onto the 32 bit field. — BitShift, Feb 23 '19 at 00:26
However, rereading this logic, I realise that I should have bitshifted the actual struct member by 8 bits, prior assignment, in order to achieve the desired outcome of tacking on a new 8 bits to a 32 bit field. However that is not something that's been pointed out. So for now let's pretend I didn't make that logical error and the logic is sound. Where else is there fault in this ternary operator. I would love to learn more about why this isn't safe or portable. I definitely do not want to assume something is good quality and go about the world writing shitty code willy-nilly... — BitShift, Feb 23 '19 at 00:28

score 2 · Accepted Answer · answered Feb 20 '19 at 07:41

Since it's a 32-bit MCU, I figured I might as well pass around 32-bit fields

That's not really the programmer's call to make. Put the 8 or 16 bit variable in a struct. Let the compiler add padding if needed. Alternatively you can use uint_fast8_t and uint_fast16_t.

My question is this: bitshifting vs array indexing, which is more appropriate?

Array indexing is for accessing arrays. If you have an array, use it. If not, then don't.

While it is possible to chew through larger chunks of data byte by byte, such code must be written much more carefully, to prevent running into various subtle type conversion and pointer aliasing bugs.

In general, bit shifting is preferred when accessing data up to the CPU's word size, 32 bits in this case. It is fast and also portable, so that you don't have to take endianess in account. It is the preferred method of serialization/de-serialization of integers.

Agreed. This is for a use-case where the code is very modular, and the gps layer I'm using will be open sourced, so I need to take into account that not everyone will want to encode every 8/16-bit of data into a 32-bit field. Either way, it's a waste of power to encode and then immediately decode like this. — BitShift, Feb 20 '19 at 09:24

Bitshifting vs array indexing, which is more appropriate for usart interfaces on 32bit MCUs

1 Answers1