Compare two pandas df rowwise and apply conditional formating to matching values

Question

I think its a small problem, but I did not succeed with a code solution. I have two dataframes df_diff and df_all. Both pandas df have the same key column (e.g. "Key") but different column names.

The code should iterate over rows of df_diff, take the key value, look up the row with the key value in df_all, and then iterate over all cells of this row of df_diff and search if any of the cells matches one cell value of the corresponding row in df_all.

If there is a match, this cell should receive red background color.

Note that the column names a different between these data frames, except the Key column.

Here is an example input: df_diff

Key	Column_1	Column 2
Key2	Value2	Value3
Key3	Value3	Value4
Key4	Value5	Value6

df_all

Key	Column_all_A	Column_all_B
Key2	Value8	Value2
Key3	Value3	Value10
Key6	Value0	Value11

The Expected Output:

Have you already tried solving this? If so can you share the code that you have? — Simon Champney, Aug 02 '23 at 23:15
Hi Simon, I found my own solution after some back and forth. however, my solution is not elegant. I could not figure out how to apply df.style.apply() or df.style.applymap() to the problem. — NeuroNaut, Aug 03 '23 at 00:17

score 1 · Accepted Answer · answered Aug 03 '23 at 00:29

1

Here is one of the options that uses a mapper with a listcomp to build the styles :

lstyles = [
    ["background-color:lightcoral" # <-- adjust the color here
    if v in df_diff.set_index("Key").T.to_dict("list").get(k, []) else ""
    for v in vals] for k, *vals in df_all.values
]

use_cols = df_all.columns.difference(["Key"])

out = (
    df_all.style.apply(lambda _: pd.DataFrame(lstyles, columns=use_cols),
                       axis=None, subset=use_cols)
)

Output :

answered Aug 03 '23 at 00:29

Timeless

22,580
4
12
30

1

Wow, great ... thank you so much Timeless. I upvoted and accepted your answer. BTW I am eager to ask such an experienced user: what was in your case the most important source(s) to learn advanced Python skill? Guess it could be an interesting source for others too ... – NeuroNaut Aug 03 '23 at 07:27
You’re welcome ;) We’re on the same boat, by the way. I think there is no hidden secret. Everything is practice! We have to keep going, fail, and fail again until we understand Python better ;) Pandas [docs](https://pandas.pydata.org/docs/#) is a gold mine, it’s my only source. Happy coding ;) – Timeless Aug 03 '23 at 13:02
:) great thx again – NeuroNaut Aug 04 '23 at 11:51

score 0 · Answer 2 · answered Aug 03 '23 at 00:13

Here is my answer to my own question:

import pandas as pd

# Sample data for df_diff
data_diff = {
'Key': ['Key2', 'Key3', 'Key4'],
'Column_1': ['Value2', 'Value3', 'Value5'],
'Column_2': ['Value3', 'Value4', 'Value6']
}
df_diff = pd.DataFrame(data_diff)

# Sample data for df_all
data_all = {
'Key': ['Key2', 'Key3', 'Key6'],
'Column_all_A': ['Value8', 'Value3', 'Value0'],
'Column_all_B': ['Value2', 'Value10', 'Value11']
}
df_all = pd.DataFrame(data_all)

# Function to find matching cells and apply red background to df_all
def highlight_matching_cells(row_all):
    # Get the key value from the current row in df_all
    key_value = row_all['Key']

    # Filter the corresponding row in df_diff using the key value
    row_diff = df_diff[df_diff['Key'] == key_value]

    # Check if a matching row is found in df_diff
    if not row_diff.empty:
        # Iterate over columns in df_all (except the 'Key' column)
        for col_all in row_all.index[1:]:
            # Iterate over columns in the matching row of df_diff (except the 'Key' column)
            for col_diff in row_diff.columns[1:]:
                # Check if the cell value in df_all matches any cell value in the matching row of df_diff
                if row_all[col_all] == row_diff[col_diff].iloc[0]:
                # If a match is found, return a list with red background for the matched cell in df_all
                    return ['background-color: red' if col == col_all else '' for col in row_all.index]
    # If no match is found, return a list with no background color for all cells in df_all
    return ['' for _ in row_all.index]

# Apply the function to each row in df_all
df_highlighted = df_all.style.apply(highlight_matching_cells, axis=1)

# Display the highlighted DataFrame
df_highlighted

This give me the desired output:

However, does anyone have a more elegant and shorter way? I would like to define a styler() function with formatting condition in it and the apply the format using df.style.apply() or df.style.applymap() to each of the matching cells.

Compare two pandas df rowwise and apply conditional formating to matching values

2 Answers2