0

I think its a small problem, but I did not succeed with a code solution. I have two dataframes df_diff and df_all. Both pandas df have the same key column (e.g. "Key") but different column names.

The code should iterate over rows of df_diff, take the key value, look up the row with the key value in df_all, and then iterate over all cells of this row of df_diff and search if any of the cells matches one cell value of the corresponding row in df_all.

If there is a match, this cell should receive red background color.

Note that the column names a different between these data frames, except the Key column.

Here is an example input: df_diff

Key Column_1 Column 2
Key2 Value2 Value3
Key3 Value3 Value4
Key4 Value5 Value6

df_all

Key Column_all_A Column_all_B
Key2 Value8 Value2
Key3 Value3 Value10
Key6 Value0 Value11

The Expected Output: Expected Output of df_all with conditioned formatting

TylerH
  • 20,799
  • 66
  • 75
  • 101
NeuroNaut
  • 68
  • 4
  • Have you already tried solving this? If so can you share the code that you have? – Simon Champney Aug 02 '23 at 23:15
  • Hi Simon, I found my own solution after some back and forth. however, my solution is not elegant. I could not figure out how to apply df.style.apply() or df.style.applymap() to the problem. – NeuroNaut Aug 03 '23 at 00:17

2 Answers2

1

Here is one of the options that uses a mapper with a listcomp to build the styles :

lstyles = [
    ["background-color:lightcoral" # <-- adjust the color here
    if v in df_diff.set_index("Key").T.to_dict("list").get(k, []) else ""
    for v in vals] for k, *vals in df_all.values
]

use_cols = df_all.columns.difference(["Key"])

out = (
    df_all.style.apply(lambda _: pd.DataFrame(lstyles, columns=use_cols),
                       axis=None, subset=use_cols)
)

Output :

enter image description here

Timeless
  • 22,580
  • 4
  • 12
  • 30
  • 1
    Wow, great ... thank you so much Timeless. I upvoted and accepted your answer. BTW I am eager to ask such an experienced user: what was in your case the most important source(s) to learn advanced Python skill? Guess it could be an interesting source for others too ... – NeuroNaut Aug 03 '23 at 07:27
  • You’re welcome ;) We’re on the same boat, by the way. I think there is no hidden secret. Everything is practice! We have to keep going, fail, and fail again until we understand Python better ;) Pandas [docs](https://pandas.pydata.org/docs/#) is a gold mine, it’s my only source. Happy coding ;) – Timeless Aug 03 '23 at 13:02
  • :) great thx again – NeuroNaut Aug 04 '23 at 11:51
0

Here is my answer to my own question:

import pandas as pd

# Sample data for df_diff
data_diff = {
'Key': ['Key2', 'Key3', 'Key4'],
'Column_1': ['Value2', 'Value3', 'Value5'],
'Column_2': ['Value3', 'Value4', 'Value6']
}
df_diff = pd.DataFrame(data_diff)

# Sample data for df_all
data_all = {
'Key': ['Key2', 'Key3', 'Key6'],
'Column_all_A': ['Value8', 'Value3', 'Value0'],
'Column_all_B': ['Value2', 'Value10', 'Value11']
}
df_all = pd.DataFrame(data_all)

# Function to find matching cells and apply red background to df_all
def highlight_matching_cells(row_all):
    # Get the key value from the current row in df_all
    key_value = row_all['Key']

    # Filter the corresponding row in df_diff using the key value
    row_diff = df_diff[df_diff['Key'] == key_value]

    # Check if a matching row is found in df_diff
    if not row_diff.empty:
        # Iterate over columns in df_all (except the 'Key' column)
        for col_all in row_all.index[1:]:
            # Iterate over columns in the matching row of df_diff (except the 'Key' column)
            for col_diff in row_diff.columns[1:]:
                # Check if the cell value in df_all matches any cell value in the matching row of df_diff
                if row_all[col_all] == row_diff[col_diff].iloc[0]:
                # If a match is found, return a list with red background for the matched cell in df_all
                    return ['background-color: red' if col == col_all else '' for col in row_all.index]
    # If no match is found, return a list with no background color for all cells in df_all
    return ['' for _ in row_all.index]

# Apply the function to each row in df_all
df_highlighted = df_all.style.apply(highlight_matching_cells, axis=1)

# Display the highlighted DataFrame
df_highlighted

This give me the desired output: enter image description here

However, does anyone have a more elegant and shorter way? I would like to define a styler() function with formatting condition in it and the apply the format using df.style.apply() or df.style.applymap() to each of the matching cells.

NeuroNaut
  • 68
  • 4