0

I have an array l1 of size (81x2), and another l2 of size (8x2). All elements of l2 are also contained in l1. I'm trying to generate an array l3 of size (73x2) containing all elements of l1 minus the ones in l2 ( ==> l3 = l1 - l2 ), but using list comprehension.

I found many similar questions on here, and almost all agree on a solution like this to generate l3:

n = 9    
index = np.arange(n)   
 
l1 = np.array([(i,j) for i in index for j in index])
l2 = np.array([(0, 3),(0, 5),(2, 4),(4, 4),(4, 2),(4, 6),(8, 3),(8, 5)])
l3 = [(i,j) for (i,j) in l1 if (i,j) not in l2]

print(l3)

However, the code above generates an array l3 that only contains 20 of the expected (81-8=) 73 elements. I don't understand how list comprehension operates here or why only those particular 20 elements are kept. Can anyone help?

NOTE: many people advise using set() instead of list comprehension for this problem, but I haven't tried that yet and I'd really like to understand why list comprehension is failing in the code above.

Qosa
  • 47
  • 5
  • Are the elements in each unique? If so `np.array(list(set(zip(*l1.T.tolist())).difference(zip(*l2.T.tolist()))))` – Onyambu Apr 28 '22 at 20:50

1 Answers1

0

Let's test the first row of l1:

In [46]: i,j = l1[0]
In [47]: i,j
Out[47]: (0, 0)
In [48]: (i,j) in l2
Out[48]: True

It's True because 0 occurs in l2. It isn't testing by rows.

There isn't a 7 in l2, so this is False

In [49]: (7,7) in l2
Out[49]: False

Make sure your list comprehension test works.

One way to test for matches is:

In [72]: x = (l1==l2[:,None,:]).all(axis=2).any(axis=0)
In [73]: x
Out[73]: 
array([False, False, False,  True, False,  True, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False,  True, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False,  True, False,  True, False,  True, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False,  True, False,  True, False, False, False])

This has 8 True values, the ones that exactly match l2:

In [74]: x.sum()
Out[74]: 8
In [75]: l1[x]
Out[75]: 
array([[0, 3],
       [0, 5],
       [2, 4],
       [4, 2],
       [4, 4],
       [4, 6],
       [8, 3],
       [8, 5]])

So the rest would be accessed with:

In [76]: l1[~x]

TO work with sets, we need to convert the arrays to lists of tuples

In [85]: s1 = set([tuple(x) for x in l1])
In [86]: s2 = set([tuple(x) for x in l2])
In [87]: len(s1.difference(s2))
Out[87]: 73

Another approach is to convert the arrays to structured arrays:

In [88]: import np.lib.recfunctions as rf
In [102]: r1 = rf.unstructured_to_structured(l1,dtype=np.dtype('i,i'))
In [103]: r2 = rf.unstructured_to_structured(l2,dtype=np.dtype('i,i'))
In [104]: r2
Out[104]: 
array([(0, 3), (0, 5), (2, 4), (4, 4), (4, 2), (4, 6), (8, 3), (8, 5)],
      dtype=[('f0', '<i4'), ('f1', '<i4')])

Now isin works - the arrays are both 1d, as required by isin:

In [105]: np.isin(r1,r2)
Out[105]: 
array([False, False, False,  True, False,  True, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False,  True, False, False, False, False,
       ...])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • I couldn't have asked for a more complete answer, thank you! – Qosa Apr 29 '22 at 08:49
  • I just have a question about this line: In [72]: x = (l1==l2[:,None,:]).all(axis=2).any(axis=0). How come l2 suddenly has 3 dimensions instead of 2? – Qosa Apr 29 '22 at 08:58