0

I would like to reconstruct a dataframe from a contingency table stored as dataframe. For example from ctab I would like to build df1 or df2. Is there a command to do that or do I need a loop?

import pandas as pd
ctab = pd.DataFrame([[1,2], [2, 1]], columns=["A", "B"], index=["A", "B"])
print(ctab)
df1 = pd.DataFrame([["A","A", 1], ["A","B", 2], ["B","A", 2], ["B","B", 1]], columns=["col", "index", "freq"])
print(df1)
df2 = pd.DataFrame([["A","A"], ["A","B"], ["A","B"], ["B","A"], ["B","A"], ["B","B"]], columns=["col", "index"])
print(df2)
sigbert
  • 77
  • 5

1 Answers1

2

You can use rename_axis, stack, and reset_index:

out = ctab.rename_axis(index='index', columns='col').stack().reset_index(name='freq')

Output:

  index col  freq
0     A   A     1
1     A   B     2
2     B   A     2
3     B   B     1

For the second one, replicate the rows with Index.repeat:

out = ctab.rename_axis(index='index', columns='col').stack().reset_index(name='freq')

out = out.loc[out.index.repeat(out.pop('freq'))]

Output:

  index col
0     A   A
1     A   B
1     A   B
2     B   A
2     B   A
3     B   B
mozway
  • 194,879
  • 13
  • 39
  • 75