0

I have two data frames:

df1 = pd.DataFrame({'Group': ['xx', 'yy', 'zz', 'x', 'x', 'x','z','y','y','y','y'],
                    'Name': ['A', 'B', 'C', None, None, None, None, None, None, None, None],
                    'Value': [5, 3, 4, 7, 1, 3, 6, 5, 9, 5, 4]})

df2 = pd.DataFrame({'Name': ['A', 'A', 'B', 'B'],
                    'Group': ['x', 'y', 'z', 'y'],
                    'Repeat': [3, 2, 1, 2]}).

All the NaN of df1["Name"] have to fill by df2["Name"] by matching "Group". Can repeat matching and filling by "Repeat" times.

Desired output:

df = pd.DataFrame({'Group': ['xx', 'yy', 'zz', 'x', 'x', 'x','z','y','y','y','y'],
                   'Name': ['A', 'B', 'C', 'A', 'A', 'A', 'B', 'A', 'A', 'B', 'B'],
                   'Value': [5, 3, 4, 7, 1, 3, 6, 5, 9, 5, 4]}) 

Also looking for the fastest run time.

I did this

for index2, row2 in df2.iterrows():
    for i in range(0, row2[2]):
        for index1 in df1.index:
            if df1.iloc[index1, 1] == row2[1] and df1.iloc[ndex1, 1] == 'NaN':
                df1.iloc[ndex1, 1] = row2[0]
                break

Looking for simpler and faster solution.

Bracula
  • 335
  • 1
  • 3
  • 14
  • Hi Parvez! Welcome to StackOverflow. Would you be able to paste the actual datasets into your question (using `df.to_dict()`) – Mark Aug 15 '23 at 09:04
  • also, in df2, how do you decide if y, 2 is A or B? – Mark Aug 15 '23 at 09:38
  • 1
    If anyone need. df1 = pd.DataFrame({'Group': ['xx', 'yy', 'zz', 'x', 'x', 'x','z','y','y','y','y'], 'Name': ['A', 'B', 'C', None, None, None, None, None, None, None, None], 'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]}) df2 = pd.DataFrame({'Name': ['A', 'A', 'B', 'B'], 'Group': ['x', 'y', 'z', 'y'], 'Repeat': [3, 2, 1, 2]}). – parvez alam Aug 16 '23 at 06:20
  • see my question above – Mark Aug 16 '23 at 06:25
  • @Itération122442 already added – Mark Aug 16 '23 at 06:27
  • @Mark as answer of your 2nd question. Doesn't matter the sequence of A & B as long the NaN been filled up matching 'Group,. – parvez alam Aug 16 '23 at 06:32
  • I don't think you understand. In the case of df1, the last four rows. There are 4 ys. there is no y, 4 row in df2 – Mark Aug 16 '23 at 06:42
  • In df2 there are 2 ys with repeat 2. Add the repeat of ys you will get 4. – parvez alam Aug 16 '23 at 07:10
  • you might have to break things down further, and explain them again. The four y values are 5, 9, 5, 4. Explain to me like I'm 5 how I get from those to A, A, B, B – Mark Aug 16 '23 at 07:31

1 Answers1

0

Certainly! Based on your provided data, you can fill in the missing values in the 'Name' column of df1 using values from the 'Name' column of df2. The matching should be based on the combination of 'Group' columns from df2.

Here's how you can achieve this:

import pandas as pd

# Sample data for demonstration
data1 = {'Group': ['xx', 'yy', 'zz', 'x', 'x', 'x', 'z', 'y', 'y', 'y', 'y'],
         'Name': ['A', 'B', 'C', None, None, None, None, None, None, None, None],
         'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]}

data2 = {'Name': ['A', 'A', 'B', 'B'],
         'Group': ['x', 'y', 'z', 'y'],
         'Repeat': [3, 2, 1, 2]}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# Merge df1 and df2 on 'Group' columns
merged_df = df1.merge(df2, on=['Group'], how='left')

# Fill missing values in 'Name' column of df1 using values from merged_df
df1['Name'] = df1['Name'].fillna(merged_df['Name_y'])

print("Updated df1:")
print(df1)

In this example, we first merge the two DataFrames df1 and df2 based on the 'Group' columns. This creates a temporary DataFrame merged_df with the matching values from df2 for each group and repeat combination. Then, we use the fillna() function to fill the missing values in the 'Name' column of df1 with the corresponding values from the 'Name_y' column of merged_df.

Devam Sanghvi
  • 548
  • 1
  • 5
  • 11
  • ah, the 'Repeated' column was a red herring – Mark Aug 16 '23 at 07:40
  • @Devam Sanghvi I like your code. It does all i wanted. How you will do, if there are multiple columns with NaN to fill up? – parvez alam Aug 16 '23 at 23:39
  • @Devam Sanghvi If df1 has some extra rows with NaN will give a wrong result. Try df1 = pd.DataFrame({'Group': ['xx', 'yy', 'zz', 'x', 'x', 'x', 'z', 'y', 'y', 'y', 'y', 'x', 'x'], 'Name': ['A', 'B', 'C', None, None, None, None, None, None, None, None, None, None], 'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 ,13]}) Can't accept your code as solution. – parvez alam Aug 18 '23 at 00:57