1

I want to find the union of several sets very efficiently because its time has an important effect on the full system.

Let's think our sets are like below:

s1 - 1, 2, 3, 4, 5, 6
s2 - 1, 2, 4, 8, 10, 12, 15, 18, 21
s3 - 1, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33

According to one solution:
They found the intersection of the sets beforehand.

s1, s2 (or s12) - 1, 2, 4 
s1, s3 (or s13) - 1 
s2, s3 (or s23) - 1

They have both individual sets and intersection sets when they need to compute the union of these 3 sets. So,

res1 = Sets.difference(s1, s12) 
res1 = Sets.difference(res1, s13)
finalRes.addAll(res1)

res2 = Sets.difference(s2, s23) 
finalRes.addAll(res2)

finalRes.addAll(s3)

Actually, I think this solution is efficient, but I wonder:

  • Can we directly use Sets.union of guava, but it seems that the above solution is more efficient than this.
  • Can we use a final set and for each element of each set we check this final set contains the element or not using contains method.

What do you suggest if we have 100 sets?

divibisan
  • 11,659
  • 11
  • 40
  • 58
rsc
  • 13
  • 6
  • 3
    What do you want to _do_ with the union? The [Union Find](https://en.wikipedia.org/wiki/Disjoint-set_data_structure) algorithm runs union/find in `O(lg* n)` time. Basically, what operations does your data structure need to support? The Guava methods are as efficient as they can be whilst operating on `Set` structures - but _slow_ compared to dedicated options. – Boris the Spider Mar 18 '18 at 19:58
  • 3
    Why not just create a new set and add all of the sets to it? – lexicore Mar 18 '18 at 20:02
  • @lexicore because that's still `O(n)`, with a _large_ constant factor. A decent structure that allows for disjoint sets runs on near linear time. – Boris the Spider Mar 18 '18 at 20:03
  • @BoristheSpider But you need to calculate disjoint sets first. – lexicore Mar 18 '18 at 20:05
  • By the way, are elements of your sets numbers in certain range? – lexicore Mar 18 '18 at 20:07
  • 4
    Calling `addAll` with `Sets.difference` is going to be slower than calling `addAll` with the set directly. – Louis Wasserman Mar 18 '18 at 20:09
  • But, Sets.difference functions are called like in the construction time. So, their times are not included. We can think that intersection sets are already given to us. @LouisWasserman – rsc Mar 18 '18 at 20:16
  • Here, we give as an example. Our elements are not integers. @Moira – rsc Mar 18 '18 at 20:17
  • @rsc, even so, if you have access to the original sets it'll be faster to add the original sets than to use `Sets.difference`. But frankly, none of these algorithms will make a significant difference either way. – Louis Wasserman Mar 18 '18 at 20:25
  • If our elements are integers and they are in a certain range, when we represent all as a boolean array, do also we iterate over all elements? So, will its time be less than the explained solution? @Moira – rsc Mar 18 '18 at 20:56

0 Answers0