0

I want to map through the rows of df1 and compare those with the values of df2 , by month and day, across every year in df2,leaving only the values in df1 which are larger than those in df2, to add into a new column, 'New'. df1 and df2 are of the same size, and are indexed by 'Month' and 'Day'. what would be the best way to do this?

df1=pd.DataFrame({'Date':['2015-01-01','2015-01-02','2015-01-03','2015-01-``04','2005-01-05'],'Values':[-5.6,-5.6,0,3.9,9.4]})

df1.Date=pd.to_datetime(df1.Date) df1['Day']=pd.DatetimeIndex(df1['Date']).day df1['Month']=pd.DatetimeIndex(df1['Date']).month df1.set_index(['Month','Day'],inplace=True) df1

df2 = pd.DataFrame({'Date':['2005-01-01','2005-01-02','2005-01-03','2005-01-``04','2005-01-05'],'Values':[-13.3,-12.2,6.7,8.8,15.5]})

df2.Date=pd.to_datetime(df1.Date) df2['Day']=pd.DatetimeIndex(df2['Date']).day df2['Month']=pd.DatetimeIndex(df2['Date']).month df2.set_index(['Month','Day'],inplace=True) df2

df1 and df2

df2['New']=df2[df2['Values']<df1['Values']]

gives ValueError: Can only compare identically-labeled Series objects

I have also tried df2['New']=df2[df2['Values'].apply(lambda x: x < df1['Values'].values)]

Community
  • 1
  • 1
Bluetail
  • 1,093
  • 2
  • 13
  • 27
  • Please add a minimal reproducible example. With just an image it's very time consuming to recreate your problem. That said, try this: df2 = df2.loc[df2['Values'] > df1['Values']].values – Hayden Eastwood Apr 23 '20 at 18:42
  • thank you. with df2 = df2.loc[df2['Values'] > df1['Values']].values, I have TypeError: unhashable type: 'numpy.ndarray'. – Bluetail Apr 23 '20 at 19:07
  • ok - please add 3-4 lines of sample data in python form (not image). – Hayden Eastwood Apr 23 '20 at 19:09
  • ok - I'll try. I have just tried c = np.maximum(df1,df2) which works, but I want the larger values of df1 in a separate column or a new df, rather than mixed up with df2 values. – Bluetail Apr 23 '20 at 19:26
  • I'm confused. df2 = df2.loc[df2['Values'] > df1['Values']].values gives AttributeError: 'numpy.ndarray' object has no attribute 'loc'. However, df2.info() ! MultiIndex: .... – Bluetail Apr 23 '20 at 19:58
  • Ok I've tried my original solution with your data and it works fine on my machine. ie. I used df2 = df2.loc[df2['Values'] > df1['Values']].values and got a result just fine. What version of Python are you using? – Hayden Eastwood Apr 23 '20 at 20:19

2 Answers2

0

The best way to handle your problem is by using numpy as a tool. Numpy has an attribute called "where"that helps a lot in cases like this.

This is how the sentence works:

df1['new column that will contain the comparison results'] = np.where(condition,'value if true','value if false').

First import the library:

import numpy as np

Using the condition provided by you:

df2['New'] = np.where(df2['Values'] > df1['Values'], df2['Values'],'')

So, I think that solves your problem... You can change the value passed to the False condition to every thin you want, this is only an example.

Tell us if it worked!

0

Let´s try two possible solutions:

The first solution is to sort the index first.

df1.sort_index(inplace=True)
df2.sort_index(inplace=True)

Perform a simple test to see if it works!

df1 == df2

it is possible to raise some kind of error, so if that happens, try this correction instead:

df1.sort_index(inplace=True, axis=1)
df2.sort_index(inplace=True, axis=1)

The second solution is to drop the indexes and reset it:

df1.sort_index(inplace=True)
df2.sort_index(inplace=True)

Perform a simple test to see if it works!

df1 == df2

See if it works and tell us the result.