I have two dataframes. Some rows are present in both, some only in one dataframe. My goal is two have only the unique rows displayed and with different colors per dataframe.
This is what I managed:
import pandas as pd
# create first DataFrame
data1 = {'id': [1, 2, 3], 'name': ['John', 'Mary', 'Bob']}
df1 = pd.DataFrame(data1)
colname=df1.columns[-1]
# create second DataFrame
data2 = {'id': [2, 3, 4], 'name': ['Mary', 'Bob', 'Alice']}
df2 = pd.DataFrame(data2)
display(df2)
# merge DataFrames on 'id' column
merged = pd.merge(df1, df2, on='id', how='outer', suffixes=('', '_2'))
# drop doubles
merged = merged[merged[colname].notnull() != merged[colname+"_2"].notnull()]
# define function to apply background color to rows
def color_rows(row):
if pd.isna(row[colname+'_2']):
# row is only in df1
return ['background-color: palegreen']*len(row)
elif pd.isna(row[colname]):
# row is only in df2
return ['background-color: powderblue']*len(row)
else:
# row is in both DataFrames
return ['']*len(row)
# apply background color to merged DataFrame
styled = merged.style.apply(color_rows, axis=1)
# display styled DataFrame
styled
my problem is that all columns get duplicated:
id name name_2 0 1 John nan 1 2 Mary Mary 2 3 Bob Bob 3 4 nan Alice
the higlighting works as expected, but I dont know hot to remove the lines that appear in both dataframes and remove all "_2" columns.
I can't apply styled.filter(like='_2', axis=1)
because styled is not a dataframe.
The problem is getting rid of the columns while keeping the color.
The suggestion from here does not work:
styled.data = styled.data.drop(colname+"_2", axis=1)
neither does styled.data.drop(list(df.filter(regex='_2')),inplace=True)