0

I need to filter pandas DataFrame using where function by conditions in reference column or index(row).

It seems by column condition, it can be successuful, but it will fail by using index(row) with similiar methods.

The question is: is this an expected behavior. If so, how to apply the filter for index(row)?

import pandas as pd
import numpy as np
from pandas import Series, DataFrame
%matplotlib inline
mydict={}
cols=4
rows=4
for i in range(cols):
    mydict[chr(ord('w')+i)]=np.random.randint(0,100,rows)
mydict
df=DataFrame(mydict,index=map(lambda x:chr(97+x), range(rows)))
print(df)
print("Filter all data if the column:w has even data ... WORKING")
print(df.loc[:,'w']%2==0)
print(df.where(lambda x: x.loc[:,'w']%2==0))

print("Filter all data if the index:a has even data ... NOT WORKING")
print(df.loc['a',:]%2==0)
print(df.where(lambda x: x.loc['a',:]%2==0, axis=1))
print(df.where(lambda x: x.loc['a',:]%2==0, axis=0))
pd.__version__

Result:

    w   x   y   z
a  42  98  74  51
b  69  82  70  40
c  93   7  78  45
d  22  61  70   4
Filter all data if the column:w has even data ... WORKING
a     True
b    False
c    False
d     True
Name: w, dtype: bool
      w     x     y     z
a  42.0  98.0  74.0  51.0
b   NaN   NaN   NaN   NaN
c   NaN   NaN   NaN   NaN
d  22.0  61.0  70.0   4.0
Filter all data if the index:a has even data ... NOT WORKING
w     True
x     True
y     True
z    False
Name: a, dtype: bool
    w   x   y   z
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN
d NaN NaN NaN NaN
    w   x   y   z
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN
d NaN NaN NaN NaN

'0.21.1'

Reference:

https://stackoverflow.com/a/44736467/3598703

Boying
  • 1,404
  • 13
  • 20
  • This sounds like a bug – Bharath M Shetty Dec 21 '17 at 08:06
  • 1
    What is your expected output? The first example works because your objects are liked-indexed (['a', 'b', 'c', 'd']). In the second example, the DataFrame index is ['a', 'b', 'c', 'd'] but the index of what is returned by your .loc call is ['w', 'x', 'y', 'z'] hence why you are getting all NA values – Will Ayd Dec 22 '17 at 02:34
  • @WillAyd I applied axis = 1 or =0 , the series shall be applied on different directions, but eventually no difference – Boying Dec 22 '17 at 05:38
  • 1
    Can you post the output you are expecting? I'm not sure you have the best approach here but I'm also not entirely clear on your end game. Posting that will help – Will Ayd Dec 23 '17 at 01:30
  • @WillAyd https://github.com/pandas-dev/pandas/issues/18904 I put a ticket in github with minimal example and expected result, and you can follow there if you like. – Boying Dec 25 '17 at 08:47

1 Answers1

0

This might be a bug. Dual transposing is much like passing axis. A work around is

df.T.where(df.loc['a',:]%2==0).T 
# This should be same as passing the `axis = 1`. It probably is a bug I guess

   w     x     y     z
a NaN  80.0  18.0  14.0
b NaN  98.0  12.0  26.0
c NaN  22.0  51.0  81.0
d NaN  57.0  99.0  23.0
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108