Creating a data structure of integers and finding which component a given integer lies in

Question

I have a set of 32 bit integers with which I will be making a data structure that will put these integers in different components ( like dsu ) the join operation has to be performed using bitwise and, if the bitwise and is not zero of two integers they lie in the same component, now given an integer how do I get ( assuming the data structure is already created for the prefix of this current integer ) the component in which this integer will lie? ex say [1,5,3,8] are given set of integers than (1,5,8) lie in one component because lsb for any pair will ensure non zero "bit and" and (8) will lie in it's own component, Or quite possibly entirely different algorithm that is much simpler if you can think of,

Edit :

It is possible however that two integers have bit and as zero but if third integer is introduced which has pairwise bit and non zero with both of them then all three will lie in same component ex [2,12, 6] 2 and 12 on operation give 0 but 2 and 6 do not and 12 and 6 do not so 6 will unify all three in one component.

Edit 2 : I am able to think of a naive algorithm that uses 32 length array for which one entry consists of linked list like structure ( like adjecentcy list ) that is if i'th bit is set then index i will contain all the integers with i'th set bit, but it could give me O(n) search, I'm thinking more towards dsu, but finding it difficult how to create it.

Does every pair of integers in a component have to share a bit position, or just one pair for each member? — kcsquared, Feb 12 '22 at 11:57
No not necessarily. There could exist a pair in a component that will have bit and zero but it may lie in the same component via some other integer — Pawan Nirpal, Feb 12 '22 at 11:59
What happens when you add 9 to this structure? It would belong in both components, so would it merge them together? — kaya3, Feb 12 '22 at 12:26

score 2 · Accepted Answer · answered Feb 12 '22 at 13:11

Here's some go code (runnable link) that represents components as a list of disjoint bitmasks. A previously seen integer is in the component that it has non-zero bitwise AND with (or the zero component if it is zero).

The list of components can be updated with a new integer by finding all components that it has non-zero bitwise AND with, and replacing them with a single component that is the bitwise OR of those components and the new element.

There is at most one component per bit in any input value, plus one if there's a zero component. That's 32 components at most here, because the input values are stated to be 32-bits.

package main

import (
    "fmt"
)

// makeComponents returns a slice of disjoint bitmasks
// that represent the different components in the input.
func makeComponents(xs []uint32) []uint32 {
    var cs []uint32
    for _, x := range xs {
        j := 0
        for i, c := range cs {
            cs[j] = cs[i]
            if c & x != 0 || (x == 0 && c == 0) {
                x |= c
            } else {
                j++
            }
        }
        cs = append(cs[:j], x)
    }
    return cs
}

// component returns the component (from cs) that x is in.
func component(cs []uint32, x uint32) int {
    for i, c := range cs {
        if x==c || x & c != 0 {
            return i
        }
    }
    panic("no component")
}

// showComponents outputs the input values with those in the same
// component on the same line.
func showComponents(xs []uint32) {
    cs := makeComponents(xs)
    xperc := make([][]uint32, len(cs))
    for _, x := range xs {
        c := component(cs, x)
        xperc[c] = append(xperc[c], x)
    }
    for _, xc := range xperc {
        fmt.Println(xc)
    }
}

func main() {
    xs := []uint32{0, 1, 3, 6, 8, 16, 48, 65}
    showComponents(xs)
}

Hey I did get the same components but my program prints only unified components can you please review my code and offer feedback. link : https://onlinegdb.com/w_tqptn5Z — Pawan Nirpal, Feb 12 '22 at 15:26

kcsquared · Answer 2 · 2022-02-12T12:59:21.333

1

The requirements imply that an incoming integer x should be joined with component C if and only if the bitwise AND of x with the (bitwise OR of all numbers in C) is nonzero.

A disjoint-set data structure would work well for this. The 'root' of each component could also maintain the bitwise OR of all of its members, which can be done with a separate hashmap, although this isn't strictly necessary.

You should also keep a hashmap from each bit-position to any member of the (unique, if it exists) component that has that bit-position set. When adding a new integer, scan over all of its set-bits: use this hashmap to find all the components to union, or whether to make a new component. This should give almost constant-time updates. There are also at most 1 + ceil(log_2(M)) different components at any time, where M is the largest integer you've seen, which would be at most 33.

If you want to avoid some of the complexity of the Disjoint-Set code, you could go with Paul Hankin's suggestion of a list of components. Depending on your programming language and whether you want to count duplicates, a component could be represented as a (bitmask, multiset) pair, where the bitmask is the bitwise OR of all members, and the multiset of members lets you print all members of a component and test for containment in O(1) time, although insertions could take longer (but still constant amortized time).

If you never need to print the members of a component or check whether a particular integer was ever actually added, it suffices to just store the bitmasks.

edited Feb 12 '22 at 12:59

answered Feb 12 '22 at 12:14

kcsquared

5,244
1
11
36

Unless the ints are really big, you don't even need a disjoint-set data structure I think, but just a list of components, each represented by a bitmask. When a new int comes in, merge all the components that intersect with it, or add a new component if none exists. – Paul Hankin Feb 12 '22 at 12:20
@PaulHankin Can you please elaborate? as if you were explaining it to a beginner ( I really am ) – Pawan Nirpal Feb 12 '22 at 12:21
@PaulHankin I considered that, but to save a mapping from each integer to its component (and support testing whether two integers are in the same component), wouldn't you basically be implementing a disjoint-set data structure? – kcsquared Feb 12 '22 at 12:23
1

I was thinking that one doesn't create the mapping, but given an int you find its component by scanning through all the components (there are at most 32), and finding the one that has non-zero and. – Paul Hankin Feb 12 '22 at 12:25
@PaulHankin I'm assuming it can be proved with piegon hole principle, am I correct? – Pawan Nirpal Feb 12 '22 at 12:35
@PawanNirpal I'm sorry, but I don't know what it is you want to prove with the pigeonhole principle. – Paul Hankin Feb 12 '22 at 12:39
To prove that there are utmost 32 components. – Pawan Nirpal Feb 12 '22 at 12:41
1

There can be 33 components, if 0 belongs in its own component. – kaya3 Feb 12 '22 at 12:50
Totally forgot about that, thanks Kaya3 – Pawan Nirpal Feb 12 '22 at 12:54
@PawanNirpal Yes, pigeonhole principle would work. If there were more than 33 components, two components would have to share a set-bit, as there are only 32 set-bits available and at most one component has no set-bit. – kcsquared Feb 12 '22 at 13:01

Creating a data structure of integers and finding which component a given integer lies in

2 Answers2