0

i have 2 dataframes, from which i have to identify a difference in cells. Wherever i will find a difference, i have to change color of that cell(background color) in first dataframe aswell as in second dataframe. In my case first dataframe cell has to be colored with #FFCCCC and second with #DAF6FF. The output of these dataframes has to be saved in two different excel files.

i have tried with these answers: https://kanoki.org/2019/01/02/pandas-trick-for-the-day-color-code-columns-rows-cells-of-dataframe/

Python pandas dataframe and excel: Add cell background color

All these talks about using openpyxml with styled. My challenge is i have to update those cells which has changes in comparison phase itself and that(coloring) has to be reflected in excel output. How do i achieve this ? hope some one would help me out here to get the right and better method.

Note: my data is huge is size(around 10000 lines in row columns).

Input:

DF1:

col1 col2 col3
10 Yes BNG
12 PE CHN
30 LIKE DHL

DF2:

col1 col2 col3
10 No BNG
13 PE CHI
30 SNM DHL

Output Has to be in excel with cells colored like this. enter image description here

My code is here:

'''

def split_compare_differ_df(difference_in_df):
    bg_delete = "background-color: red"
    bg_insert = "background-color: blue"
    unique_of_df1 = df1
    unique_of_df2 = df2
    for i in unique_of_df1.itertuples():
        for j in unique_of_df2.itertuples():
            if i[1] == j[1]:
                for idx, (a, b) in enumerate(zip(i, j)):
                    x=list(i)
                    y=list(j)
                    if not idx ==0:
                        if a == b:
                            x[idx] = a
                            y[idx] = b
                            print(f'Index {idx} match: {a}')
                        else:
                            x[idx] = '{}{}'.format(bg_delete,a)
                            y[idx] = '{}{}'.format(bg_insert, b)
                            print(f'Index {idx} no match: {a} vs {b}')
#                             targetFileActiveSheet.cell(row=rowNum, column=colNum).fill = PatternFill(bgColor='FFEE08', fill_type = 'solid')
                    i = tuple(x)
                    j = tuple(y)

                i_list = list(i)
                i_list = i_list[1:]
                i_tuple = tuple(i_list)
                j_list = list(j)
                j_list = j_list[1:]
                j_tuple = tuple(j_list)
                unique_of_df1.loc[i[0]] = i_tuple
                unique_of_df2.loc[j[0]] = j_tuple
    return(unique_of_df1,unique_of_df2)

'''

ras
  • 41
  • 1
  • 10

1 Answers1

1

The comparison code can be simplified by generating a "diff matrix":

from styleframe import StyleFrame, Styler

sf1 = StyleFrame({'col1': [10, 12, 30], 'col2': ['Yes', 'PE', 'LIKE'], 'col3': ['BNG', 'CHN', 'DHL']})
sf2 = StyleFrame({'col1': [10, 12, 30], 'col2': ['No', 'PE', 'SNM'], 'col3': ['BNG', 'CHI', 'DHL']})

sf1_diff = Styler(bg_color='#FFCCCC')
sf2_diff = Styler(bg_color='#DAF6FF')

ne = sf1.data_df != sf2.data_df

print(ne)

Now ne is a diff matrix containing True where there is a difference between the 2 dataframes:

    col1   col2   col3
0  False   True  False
1  False  False   True
2  False   True  False

Now all we have to do is to style the original frames accordingly:

for col in ne.columns:
    sf1.apply_style_by_indexes(indexes_to_style=ne[ne[col]].index,
                               styler_obj=sf1_diff,
                               cols_to_style=col)
    sf2.apply_style_by_indexes(indexes_to_style=ne[ne[col]].index,
                               styler_obj=sf2_diff,
                               cols_to_style=col)

sf1.to_excel('sf1.xlsx').save()
sf2.to_excel('sf2.xlsx').save()

And we end up with sf1.xlsx:

enter image description here

and sf2.xlsx:

enter image description here

If you want both frames on the same sheet you can use

writer = StyleFrame.ExcelWriter('output.xlsx')
sf1.to_excel(writer)
sf2.to_excel(writer, startcol=len(sf1.columns) + 2)
writer.save()
DeepSpace
  • 78,697
  • 11
  • 109
  • 154
  • i tried first phase of your solution, which suggests to get compare output in object 'ne'. However its not working for me as it says "Can only compare identically-labeled DataFrame objects", here i can't drop my index column as it required for further processing... @DeepSpace – ras May 19 '21 at 13:29