1

If I have 2 Series objects, like so: [0,0,1] [1,0,0] How would I get the intersection and union of the two? They only contain booleans which means they are non-unique values.

I have a large Boolean matrix. I've minhashed it and now I'm trying to find the false positives and negatives which I think means that I have to get the Jaccard similarity for each original pair.

user3927312
  • 814
  • 2
  • 13
  • 27

1 Answers1

2

Since you say they are booleans use logical_and and logical_or of numpy or & and | on series i.e

y1 = pd.Series([1,0,1,0])
y2 = pd.Series([1,0,0,1])

# Numpy approach 
intersection = np.logical_and(y1.values, y2.values)
union = np.logical_or(y1.values, y2.values)
intersection.sum() / union.sum()
# 0.33333333333333331

# Pandas approach 
sum(y1 & y2) / sum(y1 | y2)
# 0.33333333333333331
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108