compare pandas dataframes display unique rows and style backgroundcolor by source,

Question

I have two dataframes. Some rows are present in both, some only in one dataframe. My goal is two have only the unique rows displayed and with different colors per dataframe.

This is what I managed:

import pandas as pd

# create first DataFrame
data1 = {'id': [1, 2, 3], 'name': ['John', 'Mary', 'Bob']}


df1 = pd.DataFrame(data1)
colname=df1.columns[-1]
# create second DataFrame
data2 = {'id': [2, 3, 4], 'name': ['Mary', 'Bob', 'Alice']}
df2 = pd.DataFrame(data2)
display(df2)
# merge DataFrames on 'id' column
merged = pd.merge(df1, df2, on='id', how='outer', suffixes=('', '_2'))

# drop doubles
merged =  merged[merged[colname].notnull() !=  merged[colname+"_2"].notnull()]

# define function to apply background color to rows
def color_rows(row):
    if pd.isna(row[colname+'_2']):
        # row is only in df1
        return ['background-color: palegreen']*len(row)
    elif pd.isna(row[colname]):
        # row is only in df2
        return ['background-color: powderblue']*len(row)
    else:
        # row is in both DataFrames
        return ['']*len(row)



# apply background color to merged DataFrame
styled = merged.style.apply(color_rows, axis=1)

# display styled DataFrame
styled

my problem is that all columns get duplicated:

id name name_2 0 1 John nan 1 2 Mary Mary 2 3 Bob Bob 3 4 nan Alice

the higlighting works as expected, but I dont know hot to remove the lines that appear in both dataframes and remove all "_2" columns.

I can't apply styled.filter(like='_2', axis=1) because styled is not a dataframe.

The problem is getting rid of the columns while keeping the color.

The suggestion from here does not work: styled.data = styled.data.drop(colname+"_2", axis=1)

neither does styled.data.drop(list(df.filter(regex='_2')),inplace=True)

Mike · Answer 1 · 2023-03-11T01:52:32.730

Try this updated code:

import pandas as pd
import numpy as np
import random

# create first DataFrame
data1 = {'id': [1, 2, 3], 'name': ['John', 'Mary', 'Bob']}
df1 = pd.DataFrame(data1)

# create second DataFrame
data2 = {'id': [2, 3, 4], 'name': ['Mary', 'Bob', 'Alice']}
df2 = pd.DataFrame(data2)

# merge DataFrames on 'id' column
merged = pd.merge(df1, df2, on='id', how='outer', suffixes=('', '_2'))

# drop duplicates
merged = merged.drop_duplicates()

# define function to apply random background color to rows
def random_color():
    r = lambda: random.randint(0, 255)
    return f'rgb({r()}, {r()}, {r()})'

def color_rows(row):
    if pd.isna(row['name_2']):
        # row is only in df1
        return ['background-color: palegreen']*len(row)
    elif pd.isna(row['name']):
        # row is only in df2
        return ['background-color: powderblue']*len(row)
    else:
        # row is in both DataFrames
        return [f'background-color: {random_color()}']*len(row)

# apply background color to merged DataFrame
styled = merged.style.apply(color_rows, axis=1)

# display styled DataFrame
styled

the additional columns are still there. thank your for trying. — Bindestrich, Mar 11 '23 at 18:19

score 0 · Accepted Answer · answered Mar 11 '23 at 01:48

0

Just after writing the question down I found out about hide

styled.hide(list(styled.data.filter(regex='_2')),axis="columns")

does it.

answered Mar 11 '23 at 01:48

Bindestrich

9
3

compare pandas dataframes display unique rows and style backgroundcolor by source,

2 Answers2