1

The following code will output the following data frame (it detects changes from one data frame to the other):

import pandas as pd
import numpy as np

a = pd.DataFrame(
    {
        "A": ["1", 2, "3", 4, "5"],
        "B": ["abcd", "efgh", "ijkl", "uhyee", "uhuh"],
        "C": ["jamba", "refresh", "portobello", "performancehigh", "jackalack"],
        "D": ["OQEWINVSKD", "DKVLNQIOEVM", "asdlikvn", "asdkvnddvfvfkdd", np.nan],
    }
)

b = pd.DataFrame(
    {
        "A": ["1", 2, "3", 4, "5", 6],
        "B": ["dah", "fupa", "ijkl", "danju", "uhuh", "freshhhhhhh"],
        "C": [
            "jamba",
            "dimez",
            "pocketfresh",
            "reverbb",
            "jackalack",
            "boombackimmatouchit",
        ],
    }
)


def equalize_length(short, long):
    return pd.concat(
        [
            short,
            pd.DataFrame(
                {
                    col: ["nan"] * (long.shape[0] - short.shape[0])
                    for col in short.columns
                }
            ),
        ]
    ).reset_index(drop=True)


def equalize_width(short, long):
    return pd.concat(
        [
            short,
            pd.DataFrame({col: [] for col in long.columns if col not in short.columns}),
        ],
        axis=1,
    ).reset_index(drop=True)


def equalize(df, other_df):
    if df.shape[0] <= other_df.shape[0]:
        df = equalize_length(df, other_df)
    else:
        other_df = equalize_length(other_df, df)
    if df.shape[1] <= other_df.shape[1]:
        df = equalize_width(df, other_df)
    else:
        other_df = equalize_width(other_df, df)
    df = df.fillna("nan")
    other_df = other_df.fillna("nan")
    return df, other_df

a, b = equalize(a, b)

comparevalues = a.values == b.values

rows, cols = np.where(comparevalues == False)

for item in zip(rows, cols):
    a.iloc[item[0], item[1]] = " {} --> {} ".format(
        a.iloc[item[0], item[1]], b.iloc[item[0], item[1]]
    )
a

enter image description here

I would like to color code based on the conditions in the output. I'd like to implement something similar to the below, but my code does not work:

conditions  = [ 'np.nan -->',
               '--> np.nan', 
               '!np.nan --> !np.nan']

Colors     = [ 'Green', 
               'Red', 
               'Yellow']
    
a = np.select(conditions, Colors)

The error message I get is the following:

enter image description here

Put simply, how can I apply my conditions and colors to the data output? Expected output is a colored a based on the conditions and colors I list above.

Laurent
  • 12,287
  • 7
  • 21
  • 37
HelpMeCode
  • 299
  • 2
  • 13
  • 1
    You're missing the comma on the first line of your conditions, so the first condition is currently the string `'np.nan -->--> np.nan' due to implicit concatenation – nigh_anxiety Jun 13 '22 at 18:21
  • @nigh_anxiety thanks -- I've updated my code and the error message that I still get. – HelpMeCode Jun 13 '22 at 18:23
  • @nigh_anxiety also, although I put strings of np.nan -->, that's just how I knew how to currently convey what I'm trying to do. I'd like to find the part where ```np.nan + " --> "``` exists, similar for the other conditions. – HelpMeCode Jun 13 '22 at 18:25
  • I don't have a lot of experience with numpy, but from the documentation, the conditions list needs to be boolean expressions where `x` is the value. So the first element would be something like `x.char.startswith("np.nan -->")`. But again I'm not a numpy expert so I'm likely not 100% accurate on that expression. – nigh_anxiety Jun 13 '22 at 18:35
  • Would there be a better way to do this in Pandas? @nigh_anxiety – HelpMeCode Jun 13 '22 at 18:40
  • Sorry I can't help with pandas either. I know general Python and im learning a bit of everything. Hopefully someone with more expertise in either will come along. – nigh_anxiety Jun 13 '22 at 20:00

1 Answers1

2

You can define a helper function to colorize values depending on given conditions:

def color_differences(val):
    if not isinstance(val, str):
        return "color: "
    if "nan --> nan" in val or val == "nan":
        color = "yellow"
    elif "nan -->" in val:
        color = "green"
    elif "--> nan" in val:
        color = "red"
    elif " --> " in val:
        color="blue"
    else:
        color = ""
    return f"color: {color}"

And then, at the end of your code, add and run the following cell:

a.style.applymap(color_differences)

enter image description here

Laurent
  • 12,287
  • 7
  • 21
  • 37
  • how would you say if ```not np.nan``` --> ``` not np.nan```? @Laurent – HelpMeCode Jun 13 '22 at 22:23
  • example being abcd --> dah, efgh --> fupa, refresh --> dimez, etc @Laurent – HelpMeCode Jun 13 '22 at 22:24
  • I see. Thanks! I'm noticing that some values that are blue shouldn't be, such as ```uhuh```, ```jackalack````, and ```jamba``` as well as ```1,2,3,4,5``` in column ```A```. These values didn't show a change via the ```-->``` indicator, so they wouldn't need to be highlighted. @Laurent – HelpMeCode Jun 14 '22 at 14:33
  • Also, how would we change cell color instead of font color? @Laurent. – HelpMeCode Jun 14 '22 at 14:34
  • 1
    Right, I didn't get that part at first, see my updated answer. As for background colors, see (and upvote, if useful ;-) this other answer: https://stackoverflow.com/a/71981811/11246056. Cheers. – Laurent Jun 14 '22 at 20:48