0

I would like to select a subset of my dataframe which satisfies the following condition: I have a dataframe that shows the result of different tests of three students. As soon as one of the students gets the result "poor", they can't be considered for the experiment and need to be dropped from the dataset. My Dataframe looks like this:

import pandas as pd

data = {'Name':  ['Peter', 'Peter','Anna', 'Anna','Anna', 'Max'],
        'Result': ["Good", "Good", "Good", "Good", "poor", "Very Good"],
         }

df = pd.DataFrame (data, columns = ['Name','Points'])

This means that I would first need to look who has done poor to then delete every row with that Person in it. My desired outcome in this example would be:

df_res = pd.DataFrame({'Name': ('Peter', 'Peter', 'Max', 'Max'), 
                   'Result': ("Good", "Good", "Very Good")}) 

Can anyone help me here? Especially deleting all the rows with the corresponding names in it is an obstacle for me.

Sanoj
  • 301
  • 1
  • 16
  • 1
    Use `min('Points')` and `groupBy('Name')` to get each student's minimum points, then return the ones where the minimum is `< 20`. Finally, use that to filter the original dataframe. – Barmar Apr 24 '20 at 16:56
  • Thanks a lot for your answer, I should have been more precise. I changed the example a bit so the min function doesn't work. – Sanoj Apr 24 '20 at 17:07
  • 1
    See https://stackoverflow.com/questions/35445132/python-pandas-select-group-where-a-specific-column-contains-zeroes for how to find a group that contains a specific value. – Barmar Apr 24 '20 at 17:11
  • You don't need the `columns` parameter to build the dataframe. It also interferes with building the dataframe. – Todd Apr 24 '20 at 17:31

1 Answers1

3

Find Names of items that have a 'poor' Result, then use that to filter in records whose Names aren't in that list.

>>> df = pd.DataFrame(data) # leave out the columns parameter.
>>>
>>> df[~df.Name.isin(df[df.Result == 'poor'].Name.values)]
    Name     Result
0  Peter       Good
1  Peter       Good
5    Max  Very Good

"Boolean masking" I think we call it.

Aren't we being a bit unfair to Anna - she has more good results than all the rest. So what - she had a bad day...

=) anyway...

You can also specifically use the .drop() method too:

>>> df.drop(index=df[df.Name.isin(df[df.Result == 'poor'].Name)].index)
    Name     Result
0  Peter       Good
1  Peter       Good
5    Max  Very Good
Todd
  • 4,669
  • 1
  • 22
  • 30
  • Thanks a lot! And don't worry, it's only a hypothetical example ;) – Sanoj Apr 24 '20 at 20:05
  • 1
    That's relieving... haven't we all been Anna at some point in our lives.. Filtered out by the system.. just another entry in a row.. =P – Todd Apr 24 '20 at 21:57