0

Are there constant-time algorithms for binary set intersection and union?

I imagine using bitmaps with pointers to elements in the memory and using OR for union and AND for intersection.

Does anyone now of a solution?

Cetin Sert
  • 4,497
  • 5
  • 38
  • 76
  • 3
    That will be at least O(n), because the number of OR/AND operations you need to perform increases proportionally with the set size. – Gabe Moothart Oct 18 '10 at 18:11
  • Sets of what? Arbitrary 32-bit integers? – Mark Byers Oct 18 '10 at 18:11
  • When I've had to do this, I've pretty much used the approach you describe. The main difference being that I've generally been working with small sets, so I just used integers or arrays of integers. Instead of using them as pointers directly, however, I treat them as indices. Bitmaps might allow some small savings in terms of range checking, but IIRC the C# compiler optimizes range checking for arrays in some situations. As Gabe points out, this is actually O(n/p), where p is the implicit parallelism from the bit vector. – TechNeilogy Oct 18 '10 at 18:11
  • @Mark: hashes of strings, so yes this equals arbitrary 32-bit integers for .NET. – Cetin Sert Oct 18 '10 at 18:14
  • 1
    @TechNeilogy - actually, the JIT (not the compiler) when doing a 'for(int i = 0; i < arr.Length; i++) {...}' – Marc Gravell Oct 18 '10 at 18:14
  • @TechNeilogy, @Gabe: So using bitwise operations for union & and intersection would not help much if I had much more than 32 or 64 members in my sets? I would nonetheless like to see some source code using that approach. – Cetin Sert Oct 18 '10 at 18:20

1 Answers1

1

It is constant time up to 32 elements with the BitArray class. You could write a custom one to get up to 64 elements, using an underlying ulong[]. Unmanaged code makes 128 elements possible with the _mm_or_si128 and _mm_and_si128 intrinsics. Hard to use due to their memory alignment requirements, can't get that from the garbage collected heap.

These are not practical amounts in most any case where you'd want to optimize this kind of code. It is fundamentally an O(n) algorithm with a very small Oh. Might as well use BitArray.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536