-1
df= pd.DataFrame({'Age': [30, 35, 37, 33, 34, 30],
                  'Name': ['A', 'B', 'B', 'A', 'A', 'B']})
df2= pd.DataFrame({'Age': [30, 35], 'Name': ['A', 'B']})

How would I go on about filtering df for df2 so that for each df2['Name'] in df, the Age in df should be below df2['Age']? My solution was to copy paste df2 into conditions like so but I think creating a list/dataframe for the filter would be a better idea as you may extend it in future:

display(
    df.loc[
        (
            ((df["Name"] == "A") & (df["Age"] < 30))
            | ((df["Name"] == "B") & (df["Age"] < 35))
        ),
        ["Age", "Name"],
    ]
)
Emi OB
  • 2,814
  • 3
  • 13
  • 29
Blob
  • 371
  • 1
  • 9
  • The problem statement seems a bit unclear to me. Do you have two dataframes (df and df2) and try to create a new dataframe from these, or only df is given and you're filtering that to get df2? – Bitswazsky Jun 30 '22 at 06:59
  • Thanks for the comment, df2 is basically my filter. So df and df2 are given and I'm trying to procude a reduced dataframe df by excluding the records that should be filtered by the values in df2. – Blob Jun 30 '22 at 07:03
  • @Bitwazsky there is a 30 against B which is less than 35 – Emi OB Jun 30 '22 at 07:12
  • ah, missed that. – Bitswazsky Jun 30 '22 at 07:13

3 Answers3

3

IIUC, you can use merge then query:

out = df.merge(df2, on='Name', suffixes=(None, '2')).query('Age < Age2')[df.columns]
print(out)

# Output
   Age Name
5   30    B

Step by step:

# Merge data
>>> out = df.merge(df2, on='Name', suffixes=(None, '2'))
   Age Name  Age2
0   30    A    30
1   33    A    30
2   34    A    30
3   35    B    35
4   37    B    35
5   30    B    35

# Filter out
>>> out = out.query('Age < Age2')
   Age Name  Age2
5   30    B    35

# Restore columns
>>> out = out[df.columns]
   Age Name
5   30    B
Corralien
  • 109,409
  • 8
  • 28
  • 52
2

Use DataFrame.query with generate conditions by df2 values:

d = df2.to_dict('records')

q = ' | '.join(f'Name == \"{x["Name"]}\" & Age < {x["Age"]}' for x in d)
print (q)
Name == "A" & Age < 30 | Name == "B" & Age < 35
    

df = df.query(q)[["Age", "Name"]]
print (df)
   Age Name
5   30    B

Or filter by mask in list comprehension and join mask by np.logical_or.reduce:

mask = np.logical_or.reduce([(df['Name'] == x["Name"]) & (df.Age < x["Age"]) for x in d])
df = df.loc[mask, ["Age", "Name"]]
print (df)
   Age Name
5   30    B
    
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

I would merge the 2 dataframes together, to have another column from df2 of the age filter, like this:

dfmerge = df.merge(df2, on='Name', suffixes=[None, ' Filter'])

   Age Name  Age Filter
0   30    A          30
1   33    A          30
2   34    A          30
3   35    B          35
4   37    B          35
5   30    B          35

You can then use .loc to filter on where Age is less than Age Filter:

dfout = dfmerge.loc[dfmerge['Age'] < dfmerge['Age Filter'], ['Age', 'Name']]

   Age Name
5   30    B
Emi OB
  • 2,814
  • 3
  • 13
  • 29