Given a range of indexes (identifiers), where I want to map each index to a boolean value, that is:
// interface pseudocode
interface bitmap {
bool identifier_is_set(unsigned int id_idx) const;
void set_identifier(unsigned int id_idx, bool val) const;
};
so that I can set and query for each ID (index) if it is set or not, what would you prefer to use to implement this?
I think this is called a bit array or bitmap or bitset, correct me if I'm wrong.
Assume that the maximum identifier is predetermined and not greater than 1e6 (1m), possibly much smaller (10k - 100k). (Which means the size used by sizeof(int)*maximum_id_idx easily fits into memory.)
Possible solutions I see so far:
std::set<size_t>
- Add or erase the identifier to this set as neccessary. This would allow for arbitrarily large identifiers as long as we have a sparse bitmap.std::vector<bool>
- Sized to the appropriate maximum value, storing true or false for each id_idx.std::vector<char>
- Same thing, but not suffering from weirdstd::vector<bool>
problems. Uses less memory thanvector<int>
.std::vector<int>
- Using anint
as the boolean flag to have a container using the natural word size of the machine. (No clue if that could make a difference.)
Please answer which container type you would prefer and why, given the maximum id restriction cited above and especially considering performance aspects of querying the bitmap (inserting performance does not matter).
Note: The interface usage of vector
vs. set
does not matter, as it will be hidden behind it's wrapping class anyway.
EDIT: To add to the discussion about std::bitset : std::bitset will incorporate the whole array size into the object, that is a sizeof(std::bitset<1m>) will be a size of approx 1/8 megabyte, which makes for a huge single object and makes for something you cannot put on the stack anymore (which may or may not be relevant).