Most efficient way to compute differences and intersection of two sets in Python

Question

Let's say we have two sets s1 and s2.

I require three different sets based on these two sets:

Set of elements that exist in s1 but not in s2.
Set of elements that exist in s2 but not in s1.
Set of elements that exist in both s1 and s2.

These could be easily computed as follows:

s1 = {1, 2, 3, 4, 5}
s2 = {3, 4, 5, 6, 7}

o1 = s1 - s2
o2 = s2 - s1
o3 = s1 & s2

Is there a way to compute these sets more efficiently? I imagine the different set operations have multiple internal processing steps in common, so there might be redundancy.

I don't think its possible to do in less then `O(n)`. Which is what python does, since lookup in sets is `O(1)`. Don't need to worry about unless the operation actually takes significant time. — Equinox, Dec 21 '21 at 12:30
I would trust the implementors of the standard library to have done a better work (more efficient) than what I could do in plain Python. What are you really after? — Serge Ballesta, Dec 21 '21 at 12:49

score 1 · Accepted Answer · answered Dec 21 '21 at 14:12

Given that these operations are implemented in C in the core part of the language, I think there's pretty much nothing you can do to speed these operations up in custom-written code.

But...

Since s1 - s2 is the same as s1 - (s1 & s2), you could calculate o3 first and use this smaller set in the difference operations:

s1 = {1, 2, 3, 4, 5}
s2 = {3, 4, 5, 6, 7}

o3 = s1 & s2
o1 = s1 - o3
o2 = s2 - o3

I doubt it will make much of a difference, since set lookup and insertion are O(1), but it's conceivable that operations over smaller sets are still slightly faster.

Most efficient way to compute differences and intersection of two sets in Python

1 Answers1