13

I have been using the Bitset class in Java and I would like to do something similar in C. I suppose I would have to do it manually as most stuff in C. What would be an efficient way to implement?

byte bitset[]

maybe

bool bitset[]

?

David Robles
  • 9,477
  • 8
  • 37
  • 47
  • Efficient in terms of memory or CPU? – robert Dec 07 '10 at 01:09
  • @robert: I suppose that in terms of memory in the first place. It's because of little possible processing overhead, but serious overheads in case of cache misses. – ruslik Dec 07 '10 at 01:23
  • @robert: there's a difference? If there are a large number of bits, performance will be bound by cache misses, so packing the bits as tightly as possible will give best performance. Only if there are very few bits might it be more efficient to use a whole byte (or more) per bit. – R.. GitHub STOP HELPING ICE Dec 07 '10 at 04:05

7 Answers7

18

CCAN has a bitset implementation you can use: http://ccan.ozlabs.org/info/jbitset.html

But if you do end up implementing it yourself (for instance if you don't like the dependencies on that package), you should use an array of ints and use the native size of the computer architecture:

#define WORD_BITS (8 * sizeof(unsigned int))

unsigned int * bitarray = (int *)calloc(size / 8 + 1, sizeof(unsigned int));

static inline void setIndex(unsigned int * bitarray, size_t idx) {
    bitarray[idx / WORD_BITS] |= (1 << (idx % WORD_BITS));
}

Don't use a specific size (e.g. with uint64 or uint32), let the computer use what it wants to use and adapt to that using sizeof.

Mike Axiak
  • 11,827
  • 2
  • 33
  • 49
  • 1
    Maybe, but also maybe you want the very largest size you can efficiently operate on. If you are scanning through bits then this can be efficient. Then again, the way some CPUs load caches from memory it doesn't matter what size you choose. But on the third hand ... maybe you just have to experiment and measure. – President James K. Polk Dec 07 '10 at 01:21
  • Certainly experiment, but in my experience using the word size to split on is generally fastest. I'm not sure I understand your first point? – Mike Axiak Dec 07 '10 at 01:25
  • 3
    `sizeof` is in bytes, not bits. You need to multiply by 8 (or more generally `CHAR_BIT` in some of those expressions. – R.. GitHub STOP HELPING ICE Dec 07 '10 at 04:06
  • Isn't the first parameter to `calloc` wrong? I think it should be `(size + WORD_BITS - 1) / WORD_BITS` because that is the number of unsigned ints that is required. – Björn Lindqvist Nov 14 '16 at 21:38
  • Also `(idx % WORD_BITS)` can be simplified to `(idx & (WORD_BITS - 1))` but a good compiler maybe does that optimization automatically. – Björn Lindqvist Nov 14 '16 at 21:51
  • if one is follow the advice of "Don't use a specific size (e.g. with uint64 or uint32), let the computer use what it wants to use and adapt to that using sizeof", wouldn't it make more sense to use `intfast_t` (or `uintfast_t`) from `` instead of just plain `int`? There is no guarantee that `int` is the most efficient type. On most 64-bit systems, `int` is only 32-bit, but using a 64-bit would probably be more efficient. – Simon Kissane Jun 25 '19 at 02:04
15

Nobody mentioned what the C FAQ recommends, which is a bunch of good-old-macros:

#include <limits.h>     /* for CHAR_BIT */

#define BITMASK(b) (1 << ((b) % CHAR_BIT))
#define BITSLOT(b) ((b) / CHAR_BIT)
#define BITSET(a, b) ((a)[BITSLOT(b)] |= BITMASK(b))
#define BITCLEAR(a, b) ((a)[BITSLOT(b)] &= ~BITMASK(b))
#define BITTEST(a, b) ((a)[BITSLOT(b)] & BITMASK(b))
#define BITNSLOTS(nb) ((nb + CHAR_BIT - 1) / CHAR_BIT)

(via http://c-faq.com/misc/bitsets.html)

  • 1
    But this doesn't always guard from macro side effects for example try: `int i = 0, bits; BITSET(bits, i++)` – Luke Smith Feb 06 '15 at 00:22
  • 1
    @LukeSmith You've got a point, but it looks fairly widely used. It seems that the proper way to implement a macro is to make the caller understand it's a macro, thus putting the onus on the caller. (Anyone who doesn't like that, can wrap it in an in-line function) – Opux Jan 25 '16 at 17:28
3

Well, byte bitset[] seems a little misleading, no?

Use bit fields in a struct and then you can maintain a collection of these types (or use them otherwise as you see fit)

struct packed_struct {
  unsigned int b1:1;
  unsigned int b2:1;
  unsigned int b3:1;
  unsigned int b4:1;
  /* etc. */
} packed;
Ed S.
  • 122,712
  • 22
  • 185
  • 265
2

I recommend my BITSCAN C++ library (version 1.0 has just been released). BITSCAN is specifically oriented for fast bitscan operations. I have used it to implement NP-Hard combinatorial problems involving simple undirected graphs, such as maximum clique (see BBMC algorithm, for a leading exact solver).

A comparison between BITSCAN and standard solutions STL bitset and BOOST dynamic_bitset is available here: http://blog.biicode.com/bitscan-efficiency-at-glance/

chesslover
  • 347
  • 2
  • 6
1

You can give my PackedArray code a try with a bitsPerItem of 1.

It implements a random access container where items are packed at the bit-level. In other words, it acts as if you were able to manipulate a e.g. uint9_t or uint17_t array:

PackedArray principle:
  . compact storage of <= 32 bits items
  . items are tightly packed into a buffer of uint32_t integers

PackedArray requirements:
  . you must know in advance how many bits are needed to hold a single item
  . you must know in advance how many items you want to store
  . when packing, behavior is undefined if items have more than bitsPerItem bits

PackedArray general in memory representation:
  |-------------------------------------------------- - - -
  |       b0       |       b1       |       b2       |
  |-------------------------------------------------- - - -
  | i0 | i1 | i2 | i3 | i4 | i5 | i6 | i7 | i8 | i9 |
  |-------------------------------------------------- - - -

  . items are tightly packed together
  . several items end up inside the same buffer cell, e.g. i0, i1, i2
  . some items span two buffer cells, e.g. i3, i6
Gregory Pakosz
  • 69,011
  • 20
  • 139
  • 164
0

As usual you need to first decide what sort of operations you need to perform on your bitset. Perhaps some subset of what Java defines? After that you can decide how best to implement it. You can certainly look at the source for BitSet.java in OpenJDK for ideas.

President James K. Polk
  • 40,516
  • 21
  • 95
  • 125
-3

Make it an array of unsigned int 64.

EvilTeach
  • 28,120
  • 21
  • 85
  • 141