2

In a couple of projects of mine I have had an increasing need to deal with contiguous sequences of bits in memory - efficiently (*). So far I've written a bunch of inline-able standalone functions, templated on the choice of a "bit container" type (e.g. uint32_t), for getting and setting bits, applying 'or' and 'and' to their values, locating the container, converting lengths in bits to sizes in bytes or lengths in containers, etc. ... it looks like it's class-writing time.

I know the C++ standard library has a specialization of std::vector<bool>, which is considered by many to be a design flaw - as its iterators do not expose actual bools, but rather proxy objects. Whether that's a good idea or a bad one for a specialization, it's definitely something I'm considering - an explicit bit proxy class, which will hopefully "always" be optimized away (with a nice greasing-up with constexpr, noexcept and inline). So, I was thinking of possibly adapting std::vector code from one of the standard library implementation.

On the other hand, my intended class:

  • Will never own the data / the bits - it'll receive a starting bit container address (assuming alignment) and a length in bits, and won't allocate or free.
  • It will not be able resize the data dynamically or otherwise - not even while retaining the same amount of space like std::vector::resize(); its length will be fixed during its lifespan/scope.
  • It shouldn't anything know about the heap (and work when there is no heap)

In this sense, it's more like a span class for bits. So maybe start out with a span then? I don't know, spans are still not standard; and there are no proxies in spans...

So what would be a good basis (edit: NOT a base class) for my implementation? std::vector<bool>? std::span? Both? None? Or - maybe I'm reinventing the wheel and this is already a solved problem?

Notes:

  • The bit sequence length is known at run time, not compile time; otherwise, as @SomeProgrammerDude suggests I could use std::bitset.
  • My class doesn't need to "be-a" span or "be-a" vector, so I'm not thinking of specializing any of them.

(*) - So far not SIMD-efficiently but that may come later. Also, this may be used in CUDA code where we don't SIMDize but pretend the lanes are proper threads.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • [`std::bitset`](http://en.cppreference.com/w/cpp/utility/bitset)? And I don't really recommend specializing `std::vector`, since then you're basically just going to reimplement `std::vector`. Instead you might want to create your own class that fits your requirements betters, and which can be open-ended enough to incorporate your future plans. – Some programmer dude Jun 13 '18 at 22:01
  • @Someprogrammerdude: See the "notes" in my edit. – einpoklum Jun 13 '18 at 22:03
  • 1
    From what I'm imagining, I would start with the `std::span` implementation - which can be found online - and borrow the bit proxy from `vector`. – Drew Dormann Jun 13 '18 at 22:15
  • why not just store the raw ptr and the length in your class and then provide some functions to check for bits? since it's non-owing is see no need for a copy. – skeller Jun 13 '18 at 22:28
  • @skeller: I'm not sure I understand your comment. I don't intend to copy anything... – einpoklum Jun 13 '18 at 22:30
  • Will you modify the bits through the non-owning container? – BeeOnRope Jun 14 '18 at 01:38
  • @BeeOnRope: Yes, I did say "setting bits"... – einpoklum Jun 14 '18 at 07:14
  • Your question isn't clear to me. You say you already have a bunch of standalone functions which implement this on top of a "bit container" class (here is bit container just something like `uint64_t` - or is it more complex?) - so it seems like you already have your solution? Basically you just want to move these standalone functions inside a class which maintains a pointer to the array of bit containers? You also ask if `std::vector` or `std::span` should serve as a "basis" for your class, but what does "basis" mean here? Are you talking about using one ... – BeeOnRope Jun 14 '18 at 17:40
  • ... of those classes as part of the implementation (you've ruled out inheritance, but of course composition is still in play), or do you mean "as design inspiration"? Clearly `std::vector` is out in terms of actual implementation, since it owns its storage: whereas you are clear that you want something span/view-like. I think a clearer question would show the functions you have already, and ask about a specific decision regarding say iterator design (if you want iterators at all) or the use of proxy objects to represent one bit. – BeeOnRope Jun 14 '18 at 17:42
  • @BeeOnRope: 1. bit container - my wording was a bit misleading. It's just an integral type, like you suspect. 2. What is a "base"? Partly code-copying, partly design inspiration. 3. I explained roughly what my functions are about, I don't want to swallow the bait, post some code and get people to comment on my code. – einpoklum Jun 14 '18 at 17:57
  • @einpoklum - it still isn't clear to me why you don't wrap your existing standalone methods that suit your needs and have been developed with your use case in mind into a class, but instead present this as a choice between `vector` or `span` - but anyways assuming you want to start from scratch rather than use your methods, I took a shot at an answer. – BeeOnRope Jun 15 '18 at 03:02
  • 1
    @BeeOnRope: Because I want to use a proxy, so the code will be structured a bit differently. But thanks. – einpoklum Jun 15 '18 at 09:00
  • @einpoklum - right, I see. If you definitely want to use a proxy it would be a good thing to include in the question. In any case, `std::bitset` exposes a proxy object to allow per-bit modification, so I think it makes a great candidate. – BeeOnRope Jun 15 '18 at 14:56
  • I vote for an impl based on `std::span`; basically a wrapper around a range of objects. Even if it doesn't make it into the finalized standard, `std::span` has at least been vetted as a really good idea by the committee, so much so that it will likely remain in the std for C++20 – AndyG Jun 18 '18 at 13:18

1 Answers1

2

Rather than std::vector or std::span I suspect an implementation of your class would share more in common with std::bitset, since it is pretty much the same thing, except with a (fixed) runtime-determined size.

In fact, you could probably take a typical std::bitset implementation and move the <size_t N> template parameter into the class as a size_t size_ member (or whatever name you like) and you'll have your dynamic bitset class with almost no changes. You may want to get rid anything of you consider cruft like the constructors that take std::string and friends.

The last step is then to remove ownership of the underlying data: basically you'll remove the creation of the underlying array in the constructor and maintain a view of an existing array with some pointers.

If your clients disagree on what the underlying unsigned integer type to use for storage (what you call the "bit container"), then you may also need to make your class a template on this type, although it would be simpler if everyone agreed on say uint64_t.

As far as std::vector<bool> goes, you don't need much from that: everything that vector does that you want, std::bitset probably does too: the main thing that vector adds is dynamic growth - but you've said you don't want that. vector<bool> has the proxy object concept to represent a single bit, but so does std::bitset.

From std::span you take the idea of non-ownership of the underlying data, but I don't think this actually represents a lot of underlying code. You might want to consider the std::span approach of having either a compile-time known size or a runtime provided size (indicated by Extent == std::dynamic_extent) if that would be useful for you (mostly if you sometimes use compile-time sizes and could specialize some methods to be more efficient in that case).

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • Actually, it wouldn't be simpler for everyone to agree on uint64_t, for 2 reasons: 1. It's costly to use it on some non-x86 platforms (e.g. nVIDIA GPUs) 2. There is actual needs to instantiate this class for different container sizes due to certain practical considerations. – einpoklum Jun 15 '18 at 09:02
  • @einpoklum In that case, you could make it a template class over `C` (the bit container type) so the client code could choose the storage type. It complicates the implementation only slightly. One downside is that then bit sets with different underlying `C` choices are different types, so you can't directly pass `dyn_bitset` to something that expects a `dyn_bitset`, even though the expose the same API. If you change the accepting function to use a template argument it would work via duck typing although this also has some downsides. – BeeOnRope Jun 15 '18 at 15:01
  • Yes, indeed, I already mentioned in the question that I'm templating over the container type. By the way, when I have some time to look into the std::bitset code and play with it, I'll decide whether to accept your answer (which I've +1'ed). – einpoklum Jun 15 '18 at 15:21
  • @einpoklum - well to be clear, you said your existing solution of standalone functions was templated over the container type. In the comments on the question I tried to extract more about the existing solution and why you didn't want to use that, or what more you wanted, but you didn't take the bait, so to speak. So as I mentioned, I wrote this more or less as if you were doing a greenfield implementation so things like whether it is a template or not were up for re-evaluation. – BeeOnRope Jun 15 '18 at 16:45
  • For a general purpose class like `bitset` (including a semi-dynamic[1] bitset as you want) using a template for the storage type would be quite annoying, since each storage type results in an unrelated object, so you'd want to agree on a usual type for compatibility, but your use case might be different, more narrow and favor including the storage in the type (e.g., for performance). [1] Here I'm using "semi-dynamic" to mean a runtime length, but which is fixed at construction for every object, as you want. – BeeOnRope Jun 15 '18 at 16:46