2

I'm comparing two tables to monitor for changes. I need a resulting table that shows a before and after the change per user per attribute changed.

I used the pandas .compare method and below is an example of my current result, but I can't figure out what the next steps would be. If I'm going about this completely wrong feel free to let me know as well.

I currently have a table that looks like this:

User ID Name Favorite Status
1 Steve Chocolate Before
2 John Chocolate Before
1 Stevy Chocolate After
2 John Vanilla After

I need it to look like this:

User ID Attribute Changed Before After
1 Name Steve Stevy
2 Name John John
1 Favorite Chocolate Chocolate
2 Favorite Chocolate Vanilla

I can eliminate the lines where no changes were made.

2 Answers2

2

Maybe via pivot?

import pandas as pd

df = pd.read_clipboard() # Your df here

out = (
    df
    .pivot(index="User ID", columns="Status")
    .swaplevel(axis=1)
    .stack()
    .reset_index(names=["User ID", "Attribute Changed"])
    .rename_axis(None, axis=1)
)

out:

   User ID Attribute Changed      After     Before
0        1          Favorite  Chocolate  Chocolate
1        1              Name      Stevy      Steve
2        2          Favorite    Vanilla  Chocolate
3        2              Name       John       John
Chrysophylaxs
  • 5,818
  • 3
  • 10
  • 21
  • 1
    Similar, Melting then pivoting: `df.melt(['User ID', 'Status']).pivot(index=["User ID", "variable"], columns="Status", values='value').reset_index().rename_axis(None, axis=1)` – BeRT2me Apr 03 '23 at 17:22
  • Eliminate no-changes after pivoting: `df[df.After.ne(df.Before)]` – BeRT2me Apr 03 '23 at 17:28
  • 1
    Apparently I don't have the reputation to upvote but this worked perfectly. Thank you! – Yisroel Len Apr 03 '23 at 17:35
0

You could unstack and concatenate like

# import pandas as pd
# df = pd.read_html('https://stackoverflow.com/questions/75922021')[0]

unstacked_df = df.set_index(['User ID','Status']).unstack()

kList = {c[0] for c in unstacked_df.columns} ## {'Name','Favorite'}
dfList = [unstacked_df[k] for k in kList] ## [u__df['Name'],u__df['Favorite']]
conc_df = pd.concat(dfList, keys=kList, names=['Attribute Changed', 'User ID'])

df2Cols = ['User ID', 'Attribute Changed', 'After', 'Before']
df2 = conc_df.reset_index().rename_axis(None,axis='columns')[df2Cols]
unstacked_df conc_df df2
unstack concat df2
Driftr95
  • 4,572
  • 2
  • 9
  • 21