The following code will compare differences in two dataframes (synthetically imported from Excel):
import pandas as pd
import numpy as np
a = pd.DataFrame(
{
"A": ["1", 2, "3", 4, "5"],
"B": ["abcd", "efgh", "ijkl", "uhyee", "uhuh"],
"C": ["jamba", "refresh", "portobello", "performancehigh", "jackalack"],
}
)
b = pd.DataFrame(
{
"A": ["1", 2, "3", 4, "5"],
"Z": ["dah", "fupa", "ijkl", "danju", "uhuh"],
"C": ["jamba", "dimez", "pocketfresh", "reverbb", "jackalack"],
}
)
comparevalues = a.values == b.values
rows,cols = np.where(comparevalues == False)
for item in zip(rows, cols):
a.iloc[item[0], item[1]] = " {} --> {} ".format(
a.iloc[item[0], item[1]], b.iloc[item[0], item[1]]
)
However, as soon as I extend dataframe b
by another line, the code breaks:
b = pd.DataFrame(
{
"A": ["1", 2, "3", 4, "5", 6],
"B": ["dah", "fupa", "ijkl", "danju", "uhuh", "freshhhhhhh"],
"C": [
"jamba",
"dimez",
"pocketfresh",
"reverbb",
"jackalack",
"boombackimmatouchit",
],
}
)
And I have the same problem if I extend a
with an additional column:
a = pd.DataFrame(
{
"A": ["1", 2, "3", 4, "5"],
"B": ["abcd", "efgh", "ijkl", "uhyee", "uhuh"],
"C": ["jamba", "refresh", "portobello", "performancehigh", "jackalack"],
"D": ["OQEWINVSKD", "DKVLNQIOEVM", "asdlikvn", "asdkvnddvfvfkdd", np.nan],
}
)
How do I still compare these two data frames for differences?