1

I am trying to do a filter based on one column. For example, I want to remove the transaction whenever I see it has account_no 1111

Input

Date        Trans   account_no
2017-12-11  10000   1111
2017-12-11  10000   1112
2017-12-11  10000   1113
2017-12-11  10001   1111
2017-12-11  10002   1113

Desired Output

Date        Trans   account_no
2017-12-11  10002   1113

Edit:

This is different than operator chaining because you are dealing with a duplication/conditional filter

Reese Fitzmaurice
  • 307
  • 1
  • 2
  • 8
  • Possible duplicate of [pandas: filter rows of DataFrame with operator chaining](https://stackoverflow.com/questions/11869910/pandas-filter-rows-of-dataframe-with-operator-chaining) – splash58 Dec 27 '17 at 21:12
  • Why `2017-12-11 10000 1113` is gone after the operation? – Tai Dec 27 '17 at 21:35

3 Answers3

2

By using issubset + transform

df[~df.groupby('Trans').account_no.transform(lambda x : set([1111]).issubset(x))]
Out[1658]: 
         Date  Trans  account_no
4  2017-12-11  10002        1113
BENY
  • 317,841
  • 20
  • 164
  • 234
  • do you know how to modify to the code to filter out for multiple numbers not just 1111 but lets you also add 1112 – Reese Fitzmaurice Dec 28 '17 at 20:43
  • @ReeseFitzmaurice `df[~df.Trans.isin(df.loc[df.account_no.isin([1112,1111]),'Trans'])]` – BENY Dec 28 '17 at 20:49
  • I am still trying to keep the original question logic, would this work? df=df[~df.groupby('Trans').account_no.transform(lambda x : set([1112,1111]).issubset(x))] – Reese Fitzmaurice Dec 28 '17 at 21:29
  • @ReeseFitzmaurice this will not then , cause they will match both [1112 and 1111, when both condition satisfied, they will return true, seems like you need logic or rather than and right ? – BENY Dec 28 '17 at 22:13
1

You could do this in two steps. First find all Trans values that have an account_no that is ever equal to 1111 using .loc. Then select all other transactions with isin()

df[~df.Trans.isin(df.loc[df.account_no == 1111,'Trans'])]

         Date  Trans  account_no
4  2017-12-11  10002        1113
DJK
  • 8,924
  • 4
  • 24
  • 40
0

You can use .loc to filter based on a series.

def complex_filter_criteria(x):
    return x != 1111
df.loc[df['account_no'].apply(complex_filter_criteria)]

df['account_no'].apply(complex_filter_criteria) will return a series of True/False evaluations for each entry in the column account_no. Then, when you pass that into df.loc, it returns a dataframe consisting only of the rows corresponding to the True evaluations from the series.

enter image description here

DragonBobZ
  • 2,194
  • 18
  • 31