0

I'm trying to filter data with multiple conditions using .isin

I've created a dataframe with data like this.

    col_a   col_b   col_c
    abc     yes     a
    abc     no      b
    abc     yes     a
    def     no      b
    def     yes     a
    def     no      b
    def     yes     a
    def     no      b
    ghi     yes     a
    ghi     no      b
    ghi     yes     a

When I try this type of filtering, referring to this solution I seen on stack overflow I get back all NaN values. Pandas: Filtering multiple conditions

How can I apply the three conditions to filter?

fil_1 = test.isin({'col_a': ['abc','def','ghi']})
fil_2 = test.isin({'col_b': ['yes']})
fil_3 = test.isin({'col_c' :['a']})
data = test[fil_1 & fil_2 & fil_3]
data
tomoc4
  • 337
  • 2
  • 10
  • 29
  • `fil_1 & fil_2 & fil_3` is False for all cell in your DataFrame, so yuo get a DataFrame with NaN values. Are you sure you want use `&`? – ansev Feb 06 '20 at 10:38
  • I'm not sure I get you I can see that the row two should even be returned in the code above as, 'abc' is in col_1, 'yes ' is in col_2 and 'a' is in col_3 – tomoc4 Feb 06 '20 at 10:43

3 Answers3

1

You need:

fil_1 = test['col_a'].isin(['abc','def','ghi'])
fil_2 = test['col_b'].isin(['yes'])
fil_3 = test['col_c'].isin(['a'])

or

test.isin({'col_a': ['abc','def','ghi'],
           'col_b': ['yes'],
           'col_c' :['a']}).all(axis = 1)

df_filtered = test[fil_1 & fil_2 & fil_3]
print(df_filtered)
   col_a col_b col_c
0    abc   yes     a
2    abc   yes     a
4    def   yes     a
6    def   yes     a
8    ghi   yes     a
10   ghi   yes     a

or logic |

fil = test.isin({'col_a': ['abc','def','ghi'],'col_b': ['yes'],'col_c' :['a']})
df_filtered = df[fil]
print(df_filtered)

   col_a col_b col_c
0    abc   yes     a
1    abc   NaN   NaN
2    abc   yes     a
3    def   NaN   NaN
4    def   yes     a
5    def   NaN   NaN
6    def   yes     a
7    def   NaN   NaN
8    ghi   yes     a
9    ghi   NaN   NaN
10   ghi   yes     a

Now if we also use DataFrame.all:

df_filtered = df[fil.all(axis = 1)]
print(df_filtered)
   col_a col_b col_c
0    abc   yes     a
2    abc   yes     a
4    def   yes     a
6    def   yes     a
8    ghi   yes     a
10   ghi   yes     a

Detail

print(fil)
    col_a  col_b  col_c
0    True   True   True
1    True  False  False
2    True   True   True
3    True  False  False
4    True   True   True
5    True  False  False
6    True   True   True
7    True  False  False
8    True   True   True
9    True  False  False
10   True   True   True

print(test.isin({'col_a': ['abc','def','ghi']}))
    col_a  col_b  col_c
0    True  False  False
1    True  False  False
2    True  False  False
3    True  False  False
4    True  False  False
5    True  False  False
6    True  False  False
7    True  False  False
8    True  False  False
9    True  False  False
10   True  False  False

this return False in columns differences than col_a so you got NaN values ​​because you were using &

ansev
  • 30,322
  • 5
  • 17
  • 31
0

Here's the one-liner solution,

test[test.col_a.isin(['abc','def','ghi']) & test.col_b.isin(['yes']) & test.col_c.isin(['a'])]
Rohit Lal
  • 2,791
  • 1
  • 20
  • 36
0

A possible solution to filter your dataframe is the following: "cond1" select all the values in col_a that are either "abc" or "def" or "ghi". Then col_b is only "yes", and col_c is only "a".

cond1=(apd.col_a=="abc") | (apd.col_a=="def") | (apd.col_a=="ghi")

apd[ cond1 & (apd.col_b=="yes") & (apd.col_c=="a")]

Result:

    col_a   col_b   col_c
0   abc     yes     a
2   abc     yes     a
4   def     yes     a
6   def     yes     a
8   ghi     yes     a
10  ghi     yes     a
Fabrizio
  • 927
  • 9
  • 20