0

We can use a bitmask to represent set presence in a finite (or at least indexed) domain efficiently, for instance to represent the letters in car we could represent this in a 26-bit set like so:

abcdefghijklmnopqrstuvwxyz
10100000000000000100000000

However of course this can only represent presence, not duplicates - carry for instance actually has two rs, but a set cannot represent that.

A multiset represents a count, not just existence, so we can count duplicates, however it's not clear to me if this can be represented logically in a single number.

One idea, suggested by a coworker, would be to use primes as our indices, and represent a multiset by it's prime factorization. So our cases above would become:

car = 2^1 * 3^0 * 5^1 * ... * 61^1 * ....
carry = 2^1 * 3^0 * 5^1 * ... * 61^2 * ... 97^1 * 101^0

Is this a sound way to represent multisets? Are there better binary representations of such a concept?

dimo414
  • 47,227
  • 18
  • 148
  • 244

1 Answers1

0

Trivial: Use k bits instead of 1 bit for each element of the universe. Concatenate them to get a single number, if you care about that, but you can equivalently consider it an array of numbers (the bitset equivalent, an array of booleans, is also valid and useful).

This probably takes more space than the prime factor approach, but on the bright side, it's still very space efficient and you can test presence (and extract the count) with an array lookup and some bit fiddling, as opposed to looking up/computing the relevant prime and performing integer division.

  • What is `k` in this context? A maximum count of a given element? That seems undesirable, no? – dimo414 Feb 26 '14 at 15:26
  • @dimo414 k could be any natural number as far as my answer is concerned. Appropriate values depend on the expected maximum count, though 32 or 64 bits are efficient and more than enough for virtually any data set. One might also use arbitrary precision numbers, using either dynamic memory allocation or expanding the array as needed. But again, there's most likely a very reasonable upper limit on k. –  Feb 26 '14 at 15:39