7

The Standard allows one to choose between an integer type, an enum, and a std::bitset.

Why would a library implementor use one over the other given these choices?

Case in point, llvm's libcxx appears to use a combination of (at least) two of these implementation options:

ctype_base::mask is implemented using an integer type: <__locale>

regex_constants::syntax_option_type is implemented using an enum + overloaded operators: <regex>

The gcc project's libstdc++ uses all three:

ios_base::fmtflags is implemented using an enum + overloaded operators: <bits/ios_base.h>

regex_constants::syntax_option_type is implemented using an integer type, regex_constants::match_flag_type is implemented using a std::bitset
Both: <bits/regex_constants.h>

AFAIK, gdb cannot "detect" the bitfieldness of any of these three choices so there would not be a difference wrt enhanced debugging.

The enum solution and integer type solution should always use the same space. std::bitset does not seem to make the guarantee that sizeof(std::bitset<32>) == std::uint32_t so I don't see what is particularly appealing about std::bitset.

The enum solution seems slightly less type safe because the combinations of the masks does not generate an enumerator.

Strictly speaking, the aforementioned is with respect to n3376 and not FDIS (as I do not have access to FDIS).

Any available enlightenment in this area would be appreciated.

Xeo
  • 129,499
  • 52
  • 291
  • 397
user1290696
  • 489
  • 1
  • 4
  • 10
  • n3376: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3376.pdf – user1290696 Mar 25 '12 at 01:51
  • N3376 is just a revision of the C++11 standard that fixes minor editorial issues. There is no difference in the content. – Xeo Mar 25 '12 at 01:53
  • I am quite aware, just wanted to be clear. Thanks for the help with those links :) – user1290696 Mar 25 '12 at 01:55
  • I suppose you already know that, but I think it bears repeating: more often than not, enumeration types can hold many more values than just the set of their enumerators, so binary operations can be just fine on them. It's in fact easier to ensure that it works since we can know specify an explicit underlying type. – Luc Danton Mar 25 '12 at 14:42
  • Whatever guarantees the Standard makes about any of these alternatives, an implementer can add more integral types, enumeration type features, or additional constraints to `bitset`. But these types aren't an efficiency concern so no effort is likely to be made. – Potatoswatter Mar 27 '12 at 04:19

3 Answers3

2

My preference is to use an enum, but there are sometimes valid reasons to use an integer. Usually ctype_base::mask interacts with the native OS headers, with a mapping from ctype_base::mask to the <ctype.h> implementation-defined constants such as _CTYPE_L and _CTYPE_U used for isupper and islower etc. Using an integer might make it easier to use ctype_base::mask directly with native OS APIs.

I don't know why libstdc++'s <regex> uses a std::bitset. When that code was committed I made a mental note to replace the integer types with an enumeration at some point, but <regex> is not a priority for me to work on.

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
2

The really surprising thing is that the standard restricts it to just three alternatives. Why shouldn't a class type be acceptable? Anyway…

  • Integral types are the simplest alternative, but they lack type safety. Very old legacy code will tend to use these as they are also the oldest.
  • Enumeration types are safe but cumbersome, and until C++11 they tended to be fixed to the size and range of int.
  • std::bitset may be have somewhat more type safety in that bitset<5> and bitset<6> are different types, and addition is disallowed, but otherwise is unsafe much like an integral type. This wouldn't be an issue if they had allowed types derived from std::bitset<N>.

Clearly enums are the ideal alternative, but experience has proven that the type safety is really unnecessary. So they threw implementers a bone and allowed them to take easier routes. The short answer, then, is that laziness leads implementers to choose int or bitset.

It is a little odd that types derived from bitset aren't allowed, but really that's a minor thing.

The main specification that clause provides is the set of operations defined over these types (i.e., the bitwise operators).

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • Nowhere in the standard does it say that these types are restricted to **only** these three alternatives. These are just the three that the OP is talking about. As for type-safety being unnecessary, that's most certainly a fact *not* in evidence... – Nicol Bolas Mar 27 '12 at 05:10
  • 1
    @NicolBolas The cited paragraph says "Each bitmask type can be im- plemented as an enumerated type that overloads certain operators, as an integer type, or as a bitset (20.5)." That sounds like an exclusive list of only three alternatives. – Potatoswatter Mar 27 '12 at 06:12
  • @NicolBolas If type safety between different types of flags were important, libraries wouldn't use integral types for them. But OP gave such an example. So yes, the evidence points to little importance being placed on that. You need to cool down a bit. – Potatoswatter Mar 27 '12 at 06:14
  • _What_ is it that is supposedly restricted to these three types? Do we even know what we are talking about? – HelloGoodbye Dec 19 '18 at 12:16
0

Why would the standard allow different ways of implementing the library? And the answer is: Why not?

As you have seen, all three options are obviously used in some implementations. The standard doesn't want to make existing implementations non-conforming, if that can be avoided.

One reason to use a bitset could be that its size fits better than an enum or an integer. Not all systems even have a std::uint32_t. Maybe a bitset<24> will work better there?

Bo Persson
  • 90,663
  • 31
  • 146
  • 203
  • Another question is why mix-and-match, which does not make much sense to me. This also does not explain `enum` vs integer type, which libcxx, a new implementation uses. While it is true that `std::uint32_t` does not exist on a platform, `uint_least32_t` is guaranteed to exist on all platforms. I cannot imagine that this is also an plausible reason because none of them go as far as using `: uint_least8_t` on their `enum` based bitmasks. – user1290696 Mar 25 '12 at 07:18
  • In the case of gcc, different parts of the library are written by different people, and at different times. That might make for different choices. I bet `ios_base` was designed at least 10 years before `regex`. And even though `uint_least32_t` exists on all platforms, we don't know its size. It might sometimes be mapped to `unsigned long long` which could be an overkill. – Bo Persson Mar 25 '12 at 08:13
  • I agree with the bitset point so far as why it is an implementation option. This could be reasonable if `std::bitset` were implemented using an array of type `char` but it uses an array of type `long` in libstdc++-v3 – user1290696 Mar 25 '12 at 09:57