1

I have 2 Df with an identical structure to be merged with a condition. I need to have all the id raw but not duplicated and when there is a same id I need to choice the one from the dataframe with val==1 or just the row from the df1 if df1.val == df2.val

df1
    id   val   X1   X2   X3
0  az1     1   10   12   16
1  az2     0   15   17   19
3  az3     0    8    9   17
4  az5     1    3    2   67
5  az7     0   31    2   68

df2
    id   val   X1   X2   X3
0  az1     1   12   10   15
1  az2     1   13   15   18
3  az4     1   10    9   47
5  az6     0   33   12   68
6  az7     0   1    26   90

I want a new df3 like this:

df3
    id   val   X1   X2   X3
0  az1     1   10   12   16
1  az2     1   13   15   18
3  az3     0    8    9   17 
4  az4     1   10    9   47
5  az5     1    3    2   67
6  az6     0   33   12   68
7  az7     0   31    2   68

It contains:

  • all the df1 rows
  • all the df2 rows where df2.val==1 AND where df2.id is not present in df1
  • all the df2 rows where (df2.id == df1.id) AND df2.val==1 AND df1.val==0 (overwrite the df1 raw)
  • all the df2 rows with df2.val == 0 AND where df2.id is not present in df1

With the command:

Unique=pd.concat([df1, df2]).drop_duplicates(subset ="id", keep = False, inplace = False).sort_values('id')

Unique
    id   val   X1   X2   X3
0  az3     0    8    9   17 
1  az4     1   10    9   47
2  az5     1    3    2   67
3  az6     0   33   12   68

Then:

Temp = pd.concat([df1, Unique]).drop_duplicates().sort_values('id')

      Temp
    id   val   X1   X2   X3
0  az1     1   10   12   16
1  az3     0    8    9   17 
2  az4     1   10    9   47
3  az5     1    3    2   67
4  az6     0   33   12   68
5  az7     0   31    2   68

Finaly:

df2_1 =df2[df2["val"]==1]
df3 = pd.concat([df2_1, Temp]).drop_duplicates(subset ="id", keep = "first", inplace = False).sort_values('id')
df3
    id   val   X1   X2   X3
0  az1     1   10   12   16
1  az2     1   13   15   18
3  az3     0    8    9   17 
4  az4     1   10    9   47
5  az5     1    3    2   67
6  az6     0   33   12   68
7  az7     0   31    2   68

Any idea to do the job in more fast and elegant way?

Emiliano
  • 46
  • 4

0 Answers0