Assuming I've 2 data frames, df_a
and df_b
as below:
import pandas as pd
df_a
df_a = pd.DataFrame({"Letters": ['E', 'H', 'O', 'N', 'M', 'K', 'T', 'X'],
"Greek":['epsilon', 'eta', 'omicron', 'nu', 'mu',
'kappa', 'tau', 'chi']})
Greek Letters
0 epsilon E
1 eta H
2 omicron O
3 nu N
4 mu M
5 kappa K
6 tau T
7 chi X
df_b
df_b = pd.DataFrame({"Letters": ['Y', 'E', 'N', 'X', 'B']})
df_b["Greek"] = ""
Letters Greek
0 Y
1 E
2 N
3 X
4 B
I want to fill in column 'Greek' in df_b
using corresponding values from df_a
and I can do that using nested loops as follows:
for i, row in df_a.iterrows():
temp1 = row['Letters']
for k, row in df_b.iterrows():
temp2 = row['Letters']
if temp1 == temp2:
df_b.loc[k, "Greek"]=df_a.loc[i, "Greek"]
df_b
Letters Greek
0 Y
1 E epsilon
2 N nu
3 X chi
4 B
First, I'm wondering if I can achieve the same result more efficiently using merge
, zip
, join
or the other concatenation functions. Second, if I provide a matching else
to the if
statement, as follows:
else: df_b.loc[k, "Greek"] = float('nan')
I expected this would fill in the empty cells with NaN
and not modify the other cells. Instead, I get something like:
df_b
Letters Greek
0 Y NaN
1 E NaN
2 N NaN
3 X chi
4 B NaN
I appreciate your thoughts on the 2 points. Any additional feedback is welcome.