45

I am using a set operation in python to perform a symmetric difference between two numpy arrays. The result, however, is a set and I need to convert it back to a numpy array to move forward. Is there a way to do this? Here's what I tried:

a = numpy.array([1,2,3,4,5,6])
b = numpy.array([2,3,5])
c = set(a) ^ set(b)

The results is a set:

In [27]: c
Out[27]: set([1, 4, 6])

If I convert to a numpy array, it places the entire set in the first array element.

In [28]: numpy.array(c)
Out[28]: array(set([1, 4, 6]), dtype=object)

What I need, however, would be this:

array([1,4,6],dtype=int)

I could loop over the elements to convert one by one, but I will have 100,000 elements and hoped for a built-in function to save the loop. Thanks!

mishaF
  • 7,934
  • 9
  • 30
  • 34

4 Answers4

65

Do:

>>> numpy.array(list(c))
array([1, 4, 6])

And dtype is int (int64 on my side.)

tito
  • 12,990
  • 1
  • 55
  • 75
39

Don't convert the numpy array to a set to perform exclusive-or. Use setxor1d directly.

>>> import numpy
>>> a = numpy.array([1,2,3,4,5,6])
>>> b = numpy.array([2,3,5])
>>> numpy.setxor1d(a, b)
array([1, 4, 6])
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • Is using the numpy set routines, such as `setxor1d`, bad when the number of comparisons is large? Maybe a new question, but very related to this answer I think. In my case I will have 100k objects with 10m+ set operations. – AnnanFay Jun 02 '16 at 20:24
15

Try:

numpy.fromiter(c, int, len(c))

This is twice as fast as the solution with list as a middle product.

tstanisl
  • 13,520
  • 2
  • 25
  • 40
11

Try this.

numpy.array(list(c))

Converting to list before initializing numpy array would set the individual elements to integer rather than the first element as the object.

Abhijit
  • 62,056
  • 18
  • 131
  • 204