I have a large df with many columns and rows, with usually two rows per certain identifier as df is used for reconciliation. Is there any way to streamline identification of non-identifier columns which cause mismatch?
import pandas as pd
df = pd.DataFrame({'col_1': ['A', 'B', 'C', 'B', 'C', 'D', 'E'],
'identifier': [ 1, 2, 3, 2, 3, 4, 4],
'col_3': [ 10, 20, 30, 21, 31, 40, 41],
'col_4': [ 1, 1, 1, 1, 1, 1, 1]
})
In above df, it would be
- col_1 for identifier 4 (D vs. E)
- col_3 for identifier 2/3/4 (20 vs. 21, 30 vs. 31, 40 vs. 41)
Open to any representation that makes it easy to isolate the columns causing mismatch, their values and identifiers.