10
struct Type {
    uint8_t var : 3;
};

int main()
{
    struct Type bar;
    bar.var = 1;
    uint8_t baz = bar.var << 5;
}

According to the standard, left shifting more than the width of the left operand type is undefined behavior:

6.5.7 Bitwise shift operators/3 The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

But what about bit fields? Isn't it at least eight bits here?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Iman
  • 348
  • 2
  • 11
  • 7
    `bar.var` is going to get promoted to an `int` before the shift happens so no UB. Not sure if there is an in general answer – NathanOliver Aug 03 '22 at 16:16
  • 3
    I tried gcc 12 on `uint32_t baz = bar.var << 9` and got 512, so it is promoting to more than the left operand type. – stark Aug 03 '22 at 16:22
  • 4
    @stark Trying something and observing behaviour X is not proof something is not Undefined Behaviour. If it is UB, the standard allows any outcome, including one that looks sane to you. – marcelm Aug 04 '22 at 17:27

2 Answers2

16

There will be references to the integer promotion of the left operand. The following is the relevant promotion:

6.3.1.1.2 [...] If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; [...]

The promoted left operand is an int.


About shifting, the spec says

6.5.7.3 The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

The width of the promoted left operand — the width of an int — is at least 16. 5 is much less than 16.

No undefined behaviour yet.


The spec goes on:

6.5.7.4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

The "type of E1" refers to the type of bar.var after promotion.

E1 has an signed type. In this case, E1 can't possibly be negative, and no value of E1 multiplied by 25 would exceed what an int can represent.

No undefined behaviour yet.


Finally, we have the assignment.

6.5.16.1.2 In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.

6.3.1.3.2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.60)

No undefined behaviour there either.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • 3
    The left-hand side does not have an unsigned type. Because of the promotion to `int`, it is signed. – Eric Postpischil Aug 03 '22 at 18:23
  • 1
    There's possible ambiguity if you *just* look at what it says about the types of `E1 << E2`. But if you look at the big picture, it's clear they must mean the type after promotion. No other case in C ever cares what type a value was originally promoted from. The "before promotion" interpretation would mean that the shift result is modulo-reduced back to the range of `uint8_t`, but still have type `int` because the promotion rules do apply. – Peter Cordes Aug 04 '22 at 05:16
  • 1
    So the standard mean type of E1 *after* integer promotion, with the values that are operands to the shift operation itsef. That's what real compilers do, and it has an observable difference in a non-UB case like `((uint8_t)0xff)<< 7` producing the same result as `0xff << 7`, not `(int)(uint8_t)(0xff<<7) = 0x80`. https://godbolt.org/z/Ta5bjsov6 shows GCC doing x<<7 as movzx / sal 32bit. The other interpretation would also defeat the design intent of not requring narrow ALU operations, only working with value that have been extended to a register width (e.g. while they were loaded from memory.) – Peter Cordes Aug 04 '22 at 05:18
3

According to the C11 standard section 6.7.2.1.5:

"A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type. It is implementation-defined whether atomic types are permitted."

This means that despite var being defined as only 3-bits wide in struct Type its type is still uint8_t, so when it is used in the expression bar.var << 5, integer promotion rules apply as would be expected for its underlying type.

This means that the value of bar.var is implicitly promoted to int in accordance with integer promotion rules for integer-type values that can be represented by type int, the shift is performed in a minimum 16-bit space, then the result is implicitly demoted back to uint8_t and stored in baz, so this operation is perfectly defined by the standard.

Willis Hershey
  • 1,520
  • 4
  • 22
  • 2
    *its type is still `uint8_t`*: I'm afraid the type of a bit-field is not necessarily the declared type. But it does not matter in this case since integer promotions are performed on the operands of `<<` so `bar.var` is promoted to `int`, whose width is greater than 6, so the behavior of `bar.var << 5` is fully defined. – chqrlie Aug 03 '22 at 17:02
  • 2
    @WillisHershey — see the quote from the standard in Ikegami's answer. Both operands are promoted (yes), but the LH operand is not promoted to match the RH operand (unlike, say, `+`). – Jonathan Leffler Aug 03 '22 at 17:02
  • @chqrlie how do you figure that the type of a bitfield is not necessarily its declared type? – Willis Hershey Aug 03 '22 at 17:04
  • From what I'm reading `uint8_t` is not guaranteed to be an acceptable type for a bitfield, but the standard allows for implementation defined types, so if the code is compiling I think we can assume the type of `bar.var` is `uint8_t` – Willis Hershey Aug 03 '22 at 17:12
  • For a bit-field, the type of `int field : 6;` may be treated as `signed int` or `unsigned int`. See [§6.7.2.1 Structure and union specifiers ¶10](http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p10) and footnote 125. [¶5](http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p5) says you're correct that `uint8_t` need not be acceptable as the type for a bit-field, but I'm not aware of a compiler that places limitations on the integer types used for bit-fields. – Jonathan Leffler Aug 03 '22 at 17:23
  • 1
    That's fair, but `signed int field : 6;` and `unsigned int field : 6` do not get the same leeway, and `uint8_t` has an explicit unsigned-ness to it – Willis Hershey Aug 03 '22 at 17:26
  • The value of bit-field member defined as `unsigned int var: 3` would be promoted to `int`, not `unsigned int` as might be supposed. – Ian Abbott Aug 03 '22 at 17:27
  • Yes but only because a 3-bit wide `unsigned int` is guaranteed to fit in type `int` – Willis Hershey Aug 03 '22 at 17:28
  • Yes, the "explicit unsigned-ness" is only relevant during assignment to (or initialization of) the member. – Ian Abbott Aug 03 '22 at 17:30
  • But by contrast an `unsigned int field : 16;` on a system with 16-bit `int`s would have to be promoted to `unsigned int` because a 16-bit `int` cannot represent 65,535 on such a system while a 16-bit wide `unsigned int` can – Willis Hershey Aug 03 '22 at 17:43
  • @WillisHershey: the type of `bar.var` is compiler dependent: it can be tested using the `_Generic` selector. eg: `printf("%s\n", _Generic(bar.var, uint8_t: "uint8_t", default: "other"))` will output `uint8_t` with clang and `other` with gcc. Completing the `_Generic` selector with all basic types still reports `other` with gcc. For this compiler, `bar.var` is a 3-bit signed integer type, which will be added in C23 as type `unsigned _Bitint(3)`. – chqrlie Aug 04 '22 at 06:49
  • @WillisHershey: C23 **6.3.5 Types** *A bit-precise signed integer type is designated as `_BitInt(N)` where `N` is an integer constant expression that specifies the number of bits that are used to represent the type, including the sign bit. Each value of `N` designates a distinct type. There may also be implementation-defined extended signed integer types. The standard signed integer types, bit-precise signed integer types, and extended signed integer types are collectively called signed integer types.* – chqrlie Aug 04 '22 at 06:53
  • *[...] For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.* – chqrlie Aug 04 '22 at 06:54
  • But sadly, C23 does not make further requirements on bit-fields, so the actual type of `bar.var` will remain implementation defined. – chqrlie Aug 04 '22 at 07:04