generic function for conditional filtering in pandas dataframe

Question

sample filtering condition:-

Data

now i want to filter the above specified condition from the given data. for that i need a generic function, i.e, that function should be work for any filters not only for the above specified filters.

I know how to filter data manually in python for more than one condition.

I think generic function may be needed two arguments one is data and another one is filtering condition.

But I am unable to found the logic for write the generic function to filter the data.

Kindly anyone can help me to tackle.

Thanks in advance.

jezrael · Answer 1 · 2018-10-03T09:26:35.363

2

You can create list of conditions and then np.logical_and.reduce:

x1 = df.x==1
y2 = df.y==2 
z1 = df.z==1
y3 = df.y==3

m1 = np.logical_and.reduce([x1, y2, z1])
m2 = np.logical_and.reduce([x1, y3, z1])

Or concat all mask tohether and check all Trues per row by DataFrame.all:

m1 = pd.concat([x1, y2, z1], axis=1).all(axis=1)
m2 = pd.concat([x1, y3, z1], axis=1).all(axis=1)

EDIT:

If possible define column names with values for filtering in dictionary:

d1 = {'x':1, 'y':2, 'z':1}
d2 = {'x':1, 'y':3, 'z':1}

m1 = np.logical_and.reduce([df[k] == v for k, v in d1.items()])
m2 = np.logical_and.reduce([df[k] == v for k, v in d2.items()])

Another approach with merge by one row DataFrame created from dictionary:

df1 = pd.DataFrame([d1]).merge(df)

EDIT:

For general solution is possible parse each value of file to tuples and use operators:

df1 = pd.DataFrame({0: ['x==1', 'x==1'], 1: ['y==2', 'y<=3'], 2: ['z!=1', 'z>1']})
print (df1)
      0     1     2
0  x==1  y==2  z!=1
1  x==1  y<=3   z>1


import operator, re

ops = {'>': operator.gt,
        '<': operator.lt,
       '>=': operator.ge,
       '<=': operator.le,
       '==': operator.eq,
        '!=': operator.ne}

#if numeric, parse to float, else not touch ()e.g. if string
def try_num(x):
    try:
        return float(x)
    except ValueError:
        return x

L = df1.to_dict('r')
#https://stackoverflow.com/q/52620865/2901002
rgx = re.compile(r'([<>=!]+)')
parsed = [[rgx.split(v) for v in d.values()] for d in L]
L = [[(x, op, try_num(y)) for x,op,y in ps] for ps in parsed]
print (L)
[[('x', '==', 1.0), ('y', '==', 2.0), ('z', '!=', 1.0)], 
 [('x', '==', 1.0), ('y', '<=', 3.0), ('z', '>', 1.0)]]

And now filter by first value of list - first row of file:

m = np.logical_and.reduce([ops[j](df[i], k) for i, j, k in L[0]])
print (m)
[False False  True False]

edited Oct 03 '18 at 09:26

answered Oct 02 '18 at 10:55

jezrael

822,522
95
1,334
1,252

Thanks for response, if it possible can you add how to split one condition into multiple condition, here you might be done manually but i need generic function. And one more thing is that filtering conditions also in pandas data frame format. – Neeraja Bandreddi Oct 02 '18 at 11:02
the above answer is working only when my input is in dictionary format.but that is not enough for me I have to filter the data based on conditions i mentioned in the question. there is no way to filter the data like that? – Neeraja Bandreddi Oct 02 '18 at 11:25
@neeraja - Sorry, not understand. What it exactly input of your generic function? – jezrael Oct 02 '18 at 11:28
1

@neeraja - there are always `==` ? – jezrael Oct 02 '18 at 11:35
i didn't get you.actually my filter file is in text format , when I was loaded that into python it became as pandas data frame format , if my given format is wrong then can you suggests me the right one and filtering process.Thanks in advance. – Neeraja Bandreddi Oct 02 '18 at 11:39
@neeraja - if is possible row in filter file like `x==1 y<=3 z>1` – jezrael Oct 02 '18 at 11:40
No, i want to filter records which satisfy the given condition, like x,y and z values which i mentioned in specified condition it may be = | >=| != ..anything. I am asking you the how can i filter the conditions and how can i give the above specified condition i.e, format. – Neeraja Bandreddi Oct 02 '18 at 11:43
@neeraja, be modes while you asking for help , when you put ? this means you are asking someone to work for you dedicatedly its great than we are benifited via expert minds here in cmmunity but it hurts when someone puts ? everytime , better would to say `"there is no way to filter the data like that?"` , `Is there a way to filter the data like that`. also upvote if you like the solution or answer if it solves your puzzle. – Karn Kumar Oct 02 '18 at 11:47
Okay,I got it. Thank you – Neeraja Bandreddi Oct 02 '18 at 11:50
1

@jezrael, i love your expertise and the way to put the thing here as a solution for many learners like us, your tricks for pandas are really Greatful, my one cent & +1. – Karn Kumar Oct 02 '18 at 11:50
@neeraja - Really complicated, but need convert text file to dicts an then use my solution. – jezrael Oct 02 '18 at 12:55
@neeraja - added general solution, also [new question in SO](https://stackoverflow.com/q/52620865/2901002) was created for parsing file with operators. – jezrael Oct 03 '18 at 09:30

score 1 · Answer 2 · answered Oct 02 '18 at 12:53

Since you have a single numeric dtype, you can use the underlying NumPy array:

res = df[(df.values == [1, 2, 1]).all(1)]

print(res)

   x  y  z
0  1  2  1

For a generic function with list input:

def filter_df(df, L):
    return df[(df.values == L).all(1)]

res = filter_df(df, [1, 2, 1])

If you need a dictionary input:

def filter_df(df, d):
    L = list(map(d.get, df))
    return df[(df.values == L).all(1)]

res = filter_df(df, {'x': 1, 'y': 2, 'z': 1})

score 1 · Accepted Answer · answered Oct 03 '18 at 09:37

def filter_function(df,filter_df):
  lvl_=list()
  lvl=list()
  vlv=list()
  df1=pd.DataFrame()
  n=filter_df.apply(lambda x: x.tolist(), axis=1)
  for i in range(0,len(n)):
      for j in range(0,len(n[i])):
          if i==0:
             lvl_.append(n[i][j].split('==')[0])
          lvl.append(n[i][j].split('==')[1])
          if len(lvl)==len(n[i]):
             vlv.append(lvl)
             lvl=list()
  final_df=df[lvl_]
  for k in range(0,len(vlv)):
      df1=df1.append(final_df[final_df.isin(vlv[k])].dropna())
  return(df1)

filter_function(df,filter_df)

generic function for conditional filtering in pandas dataframe

3 Answers3