Pack bits in a struct in C++ / Arduino

Question

I have a struct:

typedef struct {
  uint8_t month;  //  1..12 [4 bits]
  uint8_t date;   //  1..31 [5 bits]
  uint8_t hour;   // 00..23 [5 bits]
  uint8_t minute; // 00..59 [6 bits]
  uint8_t second; // 00..59 [6 bits]
} TimeStamp;

but I would like to pack it so it only consumes four bytes instead of five.

Is there a way of shifting the bits to create a tighter struct?

It might not seem much, but it is going into EEPROM, so one byte saved is an extra 512 bytes in a 4 KB page (and I can use those extra six bits left over for something else too).

You could save some bytes in data but how many bytes are you using in code to pack and unpack the data and how much longer does the code take to run. — cup, May 25 '18 at 18:26
I do not like packing mapping fields into non-byte bounded boundaries. But if I must I would leave the struct as-is and write a pack function to pack the fields into the smaller data array/field. I would also write an unpack function. This way if something goes wrong, I can trace it and it is repeatable and testable. I would not try to create a packed struct first ... simply create a normal struct and pack that. I think the answer provided by @Laurijssen is a very good one and you can write a function around his answer. — Xofo, May 26 '18 at 15:25
@cup As I asked in my question, the data is stored in an EEPROM, not data (RAM) so speed and a few extra bytes used in code are not so important here. — Kent, May 27 '18 at 21:54
The timestamp comes from an MCP79411 RTCC which outputs date/time in BCD format. MCU is an ATMEGA2560 (hence 4k EEPROM). — Kent, May 27 '18 at 22:16

score 33 · Accepted Answer · edited Jan 13 '20 at 19:46

33

You're looking for bitfields.

They look like this:

typedef struct {
  uint32_t month  : 4;   // 1..12 [4 bits]
  uint32_t date   : 5;   // 1..31 [5 bits]
  uint32_t hour   : 5;   // 00..23 [5 bits]
  uint32_t minute : 6;   // 00..59 [6 bits]
  uint32_t second : 6;   // 00..59 [6 bits]
} TimeStamp;

Depending on your compiler, in order to fit into four bytes with no padding, the size of the members must be four bytes (i.e. uint32_t) in this case. Otherwise, the struct members will get padded to not overflow on each byte boundary, resulting in a struct of five bytes, if using uint8_t. Using this as a general rule should help prevent compiler discrepancies.

Here's an MSDN link that goes a bit in depth into bitfields:

C++ Bit Fields

edited Jan 13 '20 at 19:46

Peter Mortensen

30,738
21
105
131

answered May 25 '18 at 07:20

babu646

989
10
22

4

Since you linked to MSDN, it's worth mentioning that different compilers may also apply additional restrictions: my experience has been that VS will add a new storage unit on a change of declared type, whereas GCC (which Arduino uses) will continue to pack fields of different types into the same storage unit as long as they don't overrun. Always using the same type is still better for consistency of course. – Alex Celeste May 25 '18 at 09:54
5

Only bitfields of `unsigned int`, `singed int`, and `_Bool` are guaranteed to be supported by all C compilers (`int` is allowed as well, but in the context of bitfields, `int` can be signed or unsigned, depending on the implementation, so there's no point in really using it). GCC supports other types as an extension. `uint32_t` is likely a typedef for `unsigned int`, so this is probably compliant (still better to be explicit and use `unsigned int`). Also, the alignment is unspecified, so in this example, without the last bitfield, there could be 12 bits of padding. – SJL May 25 '18 at 14:02
"Better be explicit and use unsigned int" that sentence hurts my head – Cubic May 25 '18 at 15:00
@SJL The question is about an AVR device, so `unsigned int` is a 16-bit type. `uint32_t` is equivalent to an `unsigned long` on this platform. – nanofarad May 25 '18 at 15:59
1

@SJL seeing as this question is tagged C++, that's irrelevant. In C++, any integral or enumeration type is allowed: http://en.cppreference.com/w/cpp/language/bit_field – Justin May 25 '18 at 17:25
6

@Cubic Also the "singed int". Not enough cooling on the processor...? – Graham May 25 '18 at 17:43
1

@Cubic "Better be correct and use `unsigned int`" would've been more accurate, but I think SJL saying "explicit" here was a reasonable (though poor) gloss for the idea "explicitly choose the type that is actually correct". Using one of the fixed-size ints doesn't even make semantic sense here, because it's saying: "I want a 32-bit unsigned integer but make it 4 bits", which is like saying "I want a red line but make it blue". – mtraceur May 25 '18 at 20:29
@Graham What's wrong with "`signed int`" in this context? As SJL said, *in a bit field*, an unqualified `int` is *not* implicitly `signed` as in the rest of the language, but may be either signed or unsigned, so if you actually wanted signed semantics in a bit-field you need to be explicit there, even if you can follow convention and leave it implied elsewhere in your type declarations. – mtraceur May 25 '18 at 20:35
@SJL To be fair though, for C++ there *is* a time when `unsigned int` is not appropriate (though not encountered here): it cannot be used to portably declare bit fields of more than 16 bits, since the width must be "less or equal the number of bits in the underlying type". In C that means you just can't portably declare bit fields above the natural int width, but in C++ per hexafraction's comment you can, which means that there will be times when using a type other than `int` will be appropriate. – mtraceur May 25 '18 at 20:56
@mtraceur "singed" instead of "signed". I know it's a simple typo, but kind of amusing, especially with the irony of saying "better to be specific" when int is defined by the language as not being specific. – Graham May 25 '18 at 21:34
@Graham Ohhh, my mind completely passed over that typo, so I ended up reaching for the closest explanation that made sense in light of how I misread it. Now that I'm on the same page, I agree, it is amusing. – mtraceur May 25 '18 at 21:40
@mtraceur: "In C that means you just can't portably declare bit fields above the natural int width," - That is hardly surprising given that C doesn't even require `uint32_t` and friends to exist. If you want a non-native width integer, you have to build it yourself because software emulation is expensive and should look expensive. – Kevin May 26 '18 at 06:27
Thanks, I had completely forgotten about bitfields. I also remember allocating an unnamed field as :0 will force boundary alignment (if required) – Kent May 27 '18 at 22:48

score 15 · Answer 2 · answered May 25 '18 at 07:34

15

Bitfields are one "right" way to do this in general, but why not just store seconds since the start of the year instead? 4 bytes is enough to comfortably store these; in fact, 4 bytes are enough to store the seconds between 1970 and 2038. Getting the other information out of it is then a simple exercise as long as you know the current year (which you could store together with the rest of the information as long as the range of times you're interested in covers less than 70 years (and even then you could just group timestamps into 68 year ranges and store an offset for each range).

answered May 25 '18 at 07:34

Cubic

14,902
5
47
92

3

one problem with storing seconds like that is that it'll be expensive to calculate the real date, esp. in embedded systems when you don't have hardware division or even multiplication – phuclv May 25 '18 at 14:03
@LưuVĩnhPhúc: If one wants to do any kind of arithmetic involving times (e.g. figure out what the date/time will be 48 minutes from now), having one routine to convert "calendar format" into linear seconds and one to convert linear seconds into calendar format is likely to be more efficient than trying to do calculations with calendar format. If you use March 1, 2003 as a base date, find the calendar date associated with date `d` by initializing `year` to 3, then while `d` is at least 1461 subtract 1461 and add 4 to `year`. Then while `d` is greater than 365, subtract 365 and add 1... – supercat May 25 '18 at 20:29
to `year`. Then subtract out the number of days in each month (starting with March) until `d` goes negative, keeping track of which month caused that to happen. Add 1 to `d` and add 1 to the year if the month was January or February and you're done. As I think about it, some further efficiency improvements may be possible, but the operations involved are well within the range of even a tiny micro. – supercat May 25 '18 at 20:34
[upon further consideration, my intended 2003 simplification doesn't work quite right, so using March 1, 2000 as I did for code I've been using since around 2002 is probably better]. – supercat May 25 '18 at 20:43
@supercat - You mean a variant of Zeller's congruence. Use 30.61 for the month multiplier - 30.6 has rounding errors. Always starts on March 1. – cup May 29 '18 at 12:16

score 12 · Answer 3 · answered May 25 '18 at 09:21

12

Another solution is to store the values in one 32 bits variable and retrieve the individual items with bitshifting.

uint32_t timestamp = xxxx;

uint8_t month = timestamp & 0x0F;
uint8_t date = (timestamp & 0x1F0) >> 4;
uint8_t hour = (timestamp & 0x3E00) >> 9;
uint8_t minute = (timestamp & 0xFC000) >> 14;
uint8_t second = (timestamp & 0x3F00000) >> 20;

answered May 25 '18 at 09:21

Serve Laurijssen

9,266
5
45
98

3

Which is programmer-driven bitfields. :) – Eric Brown May 25 '18 at 20:30
And don't forget that changing the value of a bitfield with this approach will require twice the number of manual actions: 1) mask destination, and only 2) _or_ in the value. And many of those magic constants will require changing if you e.g. expand a single field. In other words, maintenance nightmare. – Ruslan May 26 '18 at 06:55
A nice approach. In C++ I think you can automate which numbers to "and" and shift with by templates to avoid the potential maintenance nightmares pointed out by @Ruslan. – mathreadler May 26 '18 at 08:18
Or you can just *use bitfields*. – Sneftel May 26 '18 at 08:25
I think this is the -most- correct and safe way to do it. From my experience, this method is much more easily tested (in unit testing), and is portable. With a bitfield I would be concerned if the compiler does something different or if an optimization messes something up. – Xofo May 26 '18 at 15:28

score 2 · Answer 4 · answered May 25 '18 at 20:53

If you can deal with two-second accuracy, the MS-DOS timestamp format used 16 bits to hold the date (year-1980 as 7 bits, month as 4, day as 5) and 16 bits for the time (hour as five, minute as six, seconds as five). On a processor like the Arduino, it may be possible to write code that splits values across a 16-bit boundary, but I think code will be more efficient if you can avoid such a split (as MS-DOS did by accepting two-second accuracy).

Otherwise, as was noted in another answer, using a 32-bit number of seconds since some base time will often be more efficient than trying to keep track of things in "calendar format". If all you ever need to do is advance from one calendar-format date to the next, the code to do that may be simpler than code to convert between calendar dates and linear dates, but if you need to do much of anything else (even step backward from a date to the previous one) you'll likely be better off converting dates to/from linear format when they're input or displayed, and otherwise simply work with linear numbers of seconds.

Working with linear numbers of seconds can be made more convenient if you pick as a baseline date March 1 of a leap year. Then while the date exceeds 1461, subtract that from the date and add 4 to the year (16-bit comparison and subtraction are efficient on the Arduino, and even in 2040 the loop may still take less time than a single 16x16 division). If the date exceeds 364, subtract 365 and increment the year, and try that up to twice more [if the date is 365 after the third subtraction, leave it].

Some care is needed to ensure that all corner cases work correctly, but even on a little 8-bit or 16-bit micro, conversions can be surprisingly efficient.

Pack bits in a struct in C++ / Arduino

4 Answers4