I have a numpy array of strings, some duplicated, and I'd like to compare every element with every other element to produce a new vector of 1's and 0's indicating whether each pair (i,j)
is the same or different.
e.g. ["a","b","a","c"]
-> 12-element (4*3) vector [1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1]
Is there a way to do this quickly in numpy without a double loop through all pairs of elements? My array has ~240,000 elements, so it's taking a terribly long time to do it the naive way.
I'm aware of numpy.equal.outer
, but apparently numpy.equal
is not implemented on strings, so it seems like I'll need some more clever way to compare them.