2

My dataframe has 4 columns (one dependent variable and 3 independent).

Here's a sample:

Sample data

My desired output is a contingency table, as follows:

Desired output

I can only seem to get a contingency table using one independent variable- using the following code (my df is called 'table')

pd.crosstab(index=table['Dvar'],columns=table['Var1'])

I can't seem to be able to add any other variables to this...Is the only way to achieve this to do make a separate contingency table for each var (1 to 3) and then merge/ join them?

YoungboyVBA
  • 197
  • 7

2 Answers2

2

This is not a good use case for crosstab as you already have your contingency table (just not aggregated), rather use a groupby.sum

df = pd.DataFrame([[1,0,0,0],
                   [1,1,1,0],
                   [0,1,1,1]], columns=['Var1', 'Var2', 'Var3', 'Dvar'])

out = df.groupby('Dvar', as_index=False).sum()

output:

   Dvar  Var1  Var2  Var3
0     0     2     1     1
1     1     0     1     1
mozway
  • 194,879
  • 13
  • 39
  • 75
2

First of all, contingency table is for showing correlation between features.

If you want to probably see correlation between independent and dependent features, go through this code:

pd.crosstab([table['Var1'],table['Var2'],table['Var3']],
            table['Dvar'], margins = False)

But, as you mention, to get your desired output for that use pandas.DataFrame.groupby statement as:

table.groupby('Dvar').sum()
andrewJames
  • 19,570
  • 8
  • 19
  • 51