how to use lambda function to select larger values from two python dataframes whilst comparing them by date?

Question

I want to map through the rows of df1 and compare those with the values of df2 , by month and day, across every year in df2,leaving only the values in df1 which are larger than those in df2, to add into a new column, 'New'. df1 and df2 are of the same size, and are indexed by 'Month' and 'Day'. what would be the best way to do this?

df1=pd.DataFrame({'Date':['2015-01-01','2015-01-02','2015-01-03','2015-01-``04','2005-01-05'],'Values':[-5.6,-5.6,0,3.9,9.4]})

df1.Date=pd.to_datetime(df1.Date) df1['Day']=pd.DatetimeIndex(df1['Date']).day df1['Month']=pd.DatetimeIndex(df1['Date']).month df1.set_index(['Month','Day'],inplace=True) df1

df2 = pd.DataFrame({'Date':['2005-01-01','2005-01-02','2005-01-03','2005-01-``04','2005-01-05'],'Values':[-13.3,-12.2,6.7,8.8,15.5]})

df2.Date=pd.to_datetime(df1.Date) df2['Day']=pd.DatetimeIndex(df2['Date']).day df2['Month']=pd.DatetimeIndex(df2['Date']).month df2.set_index(['Month','Day'],inplace=True) df2

df1 and df2

df2['New']=df2[df2['Values']<df1['Values']]

gives ValueError: Can only compare identically-labeled Series objects

I have also tried df2['New']=df2[df2['Values'].apply(lambda x: x < df1['Values'].values)]

Please add a minimal reproducible example. With just an image it's very time consuming to recreate your problem. That said, try this: df2 = df2.loc[df2['Values'] > df1['Values']].values — Hayden Eastwood, Apr 23 '20 at 18:42
thank you. with df2 = df2.loc[df2['Values'] > df1['Values']].values, I have TypeError: unhashable type: 'numpy.ndarray'. — Bluetail, Apr 23 '20 at 19:07
ok - please add 3-4 lines of sample data in python form (not image). — Hayden Eastwood, Apr 23 '20 at 19:09
ok - I'll try. I have just tried c = np.maximum(df1,df2) which works, but I want the larger values of df1 in a separate column or a new df, rather than mixed up with df2 values. — Bluetail, Apr 23 '20 at 19:26
I'm confused. df2 = df2.loc[df2['Values'] > df1['Values']].values gives AttributeError: 'numpy.ndarray' object has no attribute 'loc'. However, df2.info() ! MultiIndex: .... — Bluetail, Apr 23 '20 at 19:58
Ok I've tried my original solution with your data and it works fine on my machine. ie. I used df2 = df2.loc[df2['Values'] > df1['Values']].values and got a result just fine. What version of Python are you using? — Hayden Eastwood, Apr 23 '20 at 20:19

score 0 · Answer 1 · answered Apr 23 '20 at 19:09

The best way to handle your problem is by using numpy as a tool. Numpy has an attribute called "where"that helps a lot in cases like this.

This is how the sentence works:

df1['new column that will contain the comparison results'] = np.where(condition,'value if true','value if false').

First import the library:

import numpy as np

Using the condition provided by you:

df2['New'] = np.where(df2['Values'] > df1['Values'], df2['Values'],'')

So, I think that solves your problem... You can change the value passed to the False condition to every thin you want, this is only an example.

Tell us if it worked!

No, it gives the same error... ValueError: Can only compare identically-labeled Series objects — Bluetail, Apr 23 '20 at 19:17

score 0 · Answer 2 · answered Apr 23 '20 at 20:08

0

Let´s try two possible solutions:

The first solution is to sort the index first.

df1.sort_index(inplace=True)
df2.sort_index(inplace=True)

Perform a simple test to see if it works!

df1 == df2

it is possible to raise some kind of error, so if that happens, try this correction instead:

df1.sort_index(inplace=True, axis=1)
df2.sort_index(inplace=True, axis=1)

The second solution is to drop the indexes and reset it:

df1.sort_index(inplace=True)
df2.sort_index(inplace=True)

Perform a simple test to see if it works!

df1 == df2

See if it works and tell us the result.

answered Apr 23 '20 at 20:08

Felipe solares

1
1

I got all 'False' entries after df1==df2... thank you for helping me! :O) – Bluetail Apr 24 '20 at 17:43
strange enough, this time, it worked! df2['New'] = np.where(df2['Values'] > df1['Values'], df2['Values'],'') hurray! ðŸ‘¨ :-) – Bluetail Apr 24 '20 at 17:47
P.S. I used the merge function which worked (before your answer today) but it was very cumbersome. I prefer your solution. – Bluetail Apr 24 '20 at 17:50
Thank you very much, my friend. Very happy that worked! – Felipe solares Apr 25 '20 at 22:51

how to use lambda function to select larger values from two python dataframes whilst comparing them by date?

2 Answers2