I have 2 Pandas dataframes which are of unequal length. I have quoted an example below.My code should run through the value of apples in the 1st data frame and locate if it exists in the 2nd one(there will always be a value existing in the 2nd dataframe). If it finds a same value then it should store the difference of oranges in the 2 data frames into the 1st data frame. I have performed this task using 2 for loops which is also given below. The code below does the task but my actual data has 2 million entries and the second data frame has 800 entries. To use 2 for loops slows my program a lot. Is there a more efficient way of doing this task?
trial={'apples': [2,4,1,5,3,2,1,1,4,5],'oranges': [8,5,9,4,2,6,7,5,1,3]}
trial1={'apples': [1,2,3,4,5],'oranges': [2,5,6,3,1]}
df=pd.DataFrame.from_dict(trial)
df1=pd.DataFrame.from_dict(trial1)
F=[]
for i in df.apples.index:
for j in df1.apples.index:
if df.apples.ix[i]== df1.apples.ix[j]:
F.append(df.oranges.ix[i]-df1.oranges.ix[j])
df['difference']=F