How to delete values from one pandas series that are common to another?

Question

So I have a specific problem that needs to be solved. I need to DELETE elements present in one pandas series (ser1) that are common to another pandas series (ser2).

I have tried a bunch of things that do not work and the closest thing I was able to find was with arrays using np.intersect1d() function. This works to find common values, but when I try to drop indexes that are equal to these values, i get a bunch of mistakes.

I've tried a bunch of other things that did not really work and have been at it for 3 hours now so about to give up.

here are the two series:

ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

The result should be:

print(ser1)
0   1
1   2
2   3

I am sure there is a simple solution.

Corralien · Answer 1 · 2021-10-29T20:05:37.557

7

Use .isin:

>>> ser1[~ser1.isin(ser2)]
0    1
1    2
2    3
dtype: int64

The numpy version is .setdiff1d (and not .intersect1d)

>>> np.setdiff1d(ser1, ser2)
array([1, 2, 3])

edited Oct 29 '21 at 20:05

answered Oct 29 '21 at 19:59

Corralien

109,409
8
28
52

score 5 · Accepted Answer · answered Oct 29 '21 at 20:06

5

A numpy alternative, np.isin

import pandas as pd
import numpy as np

ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

res = ser1[~np.isin(ser1, ser2)]
print(res)

Micro-Benchmark

import pandas as pd
import numpy as np
ser1 = pd.Series([1, 2, 3, 4, 5] * 100)
ser2 = pd.Series([4, 5, 6, 7, 8] * 10)
%timeit res = ser1[~np.isin(ser1, ser2)]
136 µs ± 2.56 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit res = ser1[~ser1.isin(ser2)]
209 µs ± 1.66 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit pd.Index(ser1).difference(ser2).to_series()
277 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

answered Oct 29 '21 at 20:06

Dani Mesejo

61,499
6
49
76

I think you should throw in python's set here as well – sammywemmy Oct 29 '21 at 20:17
@sammywemmy Not sure how use set here, you mean convert to set and back to series? – Dani Mesejo Oct 29 '21 at 20:19
``set(ser1).difference(ser2)`` ... danger with set though is that it is unordered – sammywemmy Oct 29 '21 at 20:19
1

Works, thanks for your time! – JacobMarlo Oct 29 '21 at 20:20
1

@sammywemmy Set is really fast, but is not ordered. For the same values it gives 29us – Dani Mesejo Oct 29 '21 at 20:26

score 2 · Answer 3 · answered Oct 29 '21 at 20:02

2

You can use set notation - I am not sure of the speed though, compared to isin:

pd.Index(ser1).difference(ser2).to_series()
Out[35]: 
1    1
2    2
3    3
dtype: int64

answered Oct 29 '21 at 20:02

sammywemmy

27,093
4
17
31

This works well thank you, is there a reason as to why the index of the new series doesn't start at 0 though? – JacobMarlo Oct 29 '21 at 20:05
1

ahhh ... so the index is repeated, both as an index, and as a Series. within the `to_series` method, you can manually pass in the new index. or just reset_index – sammywemmy Oct 29 '21 at 20:06
To reset index I used: ser1 = `(pd.Index(ser1).difference(ser2).to_series()) ser1 = ser1.reset_index() print(ser1)` and it gave me this as an answer : `Name: alphabets, dtype: object index 0 0 1 1 1 2 2 2 3 3` – JacobMarlo Oct 29 '21 at 20:16
1

use ``reset_index(drop=True)`` – sammywemmy Oct 29 '21 at 20:17
Works, thank you for your answer, much appreciated! – JacobMarlo Oct 29 '21 at 20:19

How to delete values from one pandas series that are common to another?

3 Answers3