Finding intersection/difference between python lists

Question

I have two python lists:

a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]

b = ['the', 'when', 'send', 'we', 'us']

I need to filter out all the elements from a that are similar to those in b. Like in this case, I should get:

c = [('why', 4), ('throw', 9), ('you', 1)]

What should be the most effective way?

Why not use the method intersection? It works off sets but you can probably make it work better ;) — Henrik Andersson, Feb 23 '13 at 09:34
Why is this question tagged with numpy? Do you need a numpy solution? — bmu, Feb 24 '13 at 10:12

score 11 · Accepted Answer · answered Feb 23 '13 at 09:30

11

A list comprehension will work.

a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]
b = ['the', 'when', 'send', 'we', 'us']
filtered = [i for i in a if not i[0] in b]

>>>print(filtered)
[('why', 4), ('throw', 9), ('you', 1)]

answered Feb 23 '13 at 09:30

Octipi

835
7
12

this is a much elegant way of doing it while keeping the lists as lists, and not treating them as dicts...thank you for the help. – khan Feb 23 '13 at 11:43
You should convert `b` to a `set` if you are using the `in` operator. It changes the lookup time from linear to constant, which will make a huge difference when `b` is a long list. So, `c = set(b)` and then `filtered = [i for i in a if not i[0] in c]`. Note that `b` became `c` in the last line. Even on this short list with 5 items, it results in a 25% speed improvement for me. With a longer list (100 items in `b`), it results in a 90% speed improvement. – Carl Apr 17 '20 at 11:44

Blender · Answer 2 · 2013-02-23T15:07:45.433

5

A list comprehension should work:

c = [item for item in a if item[0] not in b]

Or with a dictionary comprehension:

d = dict(a)
c = {key: value for key in d.iteritems() if key not in b}

edited Feb 23 '13 at 15:07

answered Feb 23 '13 at 09:28

Blender

289,723
53
439
496

Did you want `{key: value for key, value in d.iteritems() if key not in b}`? – Eric Feb 23 '13 at 09:42

score 2 · Answer 3 · answered Feb 23 '13 at 11:11

in is nice, but you should use sets at least for b. If you have numpy, you could also try np.in1d of course, but if it is faster or not, you should probably try.

# ruthless copy, but use the set...
b = set(b)
filtered = [i for i in a if not i[0] in b]

# with numpy (note if you create the array like this, you must already put
# the maximum string length, here 10), otherwise, just use an object array.
# its slower (likely not worth it), but safe.
a = np.array(a, dtype=[('key', 's10'), ('val', int)])
b = np.asarray(b)

mask = ~np.in1d(a['key'], b)
filtered = a[mask]

Sets also have have the methods difference, etc. which probably are not to useful here, but in general probably are.

+1 for numpy. Didn't saw your answer before posting my answer. `in1d` is faster than the list comprehension for larger data sets by a factor of 2. — bmu, Feb 24 '13 at 10:41

score 2 · Answer 4 · answered Feb 24 '13 at 10:37

As this is tagged with numpy, here is a numpy solution using numpy.in1d benchmarked against the list comprehension:

In [1]: a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]

In [2]: b = ['the', 'when', 'send', 'we', 'us']

In [3]: a_ar = np.array(a, dtype=[('string','|S5'), ('number',float)])

In [4]: b_ar = np.array(b)

In [5]: %timeit filtered = [i for i in a if not i[0] in b]
1000000 loops, best of 3: 778 ns per loop

In [6]: %timeit filtered = a_ar[-np.in1d(a_ar['string'], b_ar)]
10000 loops, best of 3: 31.4 us per loop

So for 5 records the list comprehension is faster.

However for large data sets the numpy solution is twice as fast as the list comprehension:

In [7]: a = a * 1000

In [8]: a_ar = np.array(a, dtype=[('string','|S5'), ('number',float)])

In [9]: %timeit filtered = [i for i in a if not i[0] in b]
1000 loops, best of 3: 647 us per loop

In [10]: %timeit filtered = a_ar[-np.in1d(a_ar['string'], b_ar)]
1000 loops, best of 3: 302 us per loop

Arpit · Answer 5 · 2013-02-23T09:43:14.457

0

Try this :

a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]

b = ['the', 'when', 'send', 'we', 'us']

c=[]

for x in a:
    if x[0] not in b:
        c.append(x)
print c

Demo: http://ideone.com/zW7mzY

edited Feb 23 '13 at 09:43

answered Feb 23 '13 at 09:30

Arpit

12,767
3
27
40

Backwards: the OP wants `c` to contain the things _not_ in `b` – Eric Feb 23 '13 at 09:37
1

This seems to be the "`c++` way", not the "`python` way" ;) – yo' Feb 23 '13 at 09:37
@tohecz c++ doesn't support `in` operator. – Arpit Feb 23 '13 at 09:40
@Arpit No, but essentially uses loops for container manipulations, which Python essentially _ought not to_. – yo' Feb 23 '13 at 09:53
Im still rooting for intersection! :] – Henrik Andersson Feb 23 '13 at 09:58

score 0 · Answer 6 · answered Apr 17 '20 at 11:11

Easy way

a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]
b = ['the', 'when', 'send', 'we', 'us']
c=[] # a list to store the required tuples 
#compare the first element of each tuple in with an element in b
for i in a:
    if i[0] not in b:
        c.append(i)
print(c)

score -1 · Answer 7 · answered Feb 23 '13 at 09:33

-1

Use filter:

c = filter(lambda (x, y): False if x in b else True, a)

answered Feb 23 '13 at 09:33

Rahul Banerjee

2,343
15
16

**-1**: If you're using `False if .. else True` or `True if ... else False` then you're doing it wrong – Eric Feb 23 '13 at 09:36
Wrong according to a certain "Python style", or wrong due to some other reason? – Rahul Banerjee Feb 23 '13 at 11:22
1

`X in Y` itself is a boolean statement in python – thkang Feb 23 '13 at 11:35
Wrong in the same way as `if x > 1 == True:`, `if (x > 1 == True) == True:` or `if condition: b = True\n else: b = False` – Eric Feb 23 '13 at 11:36
2

@RahulBanerjee `False if ... else True` is needlessly complex and hard to read - just do `lambda (x, y): x not in b`. Also, this causes a syntax error in Python 3 - you would have to do `lambda x: x[0] not in b` because the form of argument unpacking you use is no longer part of the language. – lvc Feb 23 '13 at 11:39
@Eric: I think you are complaining that it is hard-to-read, but I find it easy to read. – Rahul Banerjee Feb 24 '13 at 00:44
@lvc: Your answer makes sense (i.e., this syntax is illegal in Python 3). Thanks! – Rahul Banerjee Feb 24 '13 at 00:45
1

Part of the problem here is that `filter(lambda:...` is inherently hard to read (vs, say, a filtered comprehension). Presumably, you prefer your notation because it includes an `if`. – Eric Feb 24 '13 at 01:26
Fair enough. And that might explain why I didn't come up with `lambda (x, y): x not in b` in the first place (which is illegal in 3.0 anyway). – Rahul Banerjee Feb 25 '13 at 06:34
@Eric oh really, "filter(labmda" is hard to read ? Probably it is only your opinion, this really depends on experience. However here is only one disadvantage of using filter with lambdas - it's performance. generators and plain if works faster. – Reishin Aug 11 '16 at 16:09
@Reishin: It's more than just my opinion - Guido himself wanted to remove them. See a related post [here](http://stackoverflow.com/a/3013722/102441) – Eric Aug 13 '16 at 00:55

Finding intersection/difference between python lists

7 Answers7

Linked

Related