4

In Pandas, I am trying to manually code a chi-square test. I am comparing row 0 with row 1 in the dataframe below.

data
       2      3      5      10     30
0      3      0      6      5      0
1  33324  15833  58305  54402  38920

For this, I need to calculate the expected cell counts for each cell as: cell(i,j) = rowSum(i)*colSum(j) / sumAll. In R, I can do this simply by taking the outer() products:

Exp_counts <- outer(rowSums(data), colSums(data), "*")/sum(data)    # Expected cell counts

I used numpy's outer product function to imitate the outcome of the above R code:

import numpy as np
pd.DataFrame(np.outer(data.sum(axis=1),data.sum(axis=0))/ (data.sum().sum()), index=data.index, columns=data.columns.values)
       2      3      5      10     30
0      2      1      4      3      2
1  33324  15831  58306  54403  38917

Is it possible to achieve this with a Pandas function?

Zhubarb
  • 11,432
  • 18
  • 75
  • 114
  • 1
    would this not work? `not_yet_df = np.outer(data.sum(axis=0), data.sum(axis=1))/ (data.sum().sum())` and then `now_a_df = pd.DataFrame(not_yet_df)` besides, you can call the `outer` function from pandas without importing numpy if you want with `pd.np.outer(..)` – mkln Jan 28 '14 at 11:09
  • Yes, it does (but I realised the axes order need to be inverted while summing). I re-worded my question, including the numpy solution. I am looking for a way to do this with a Pandas function. – Zhubarb Jan 28 '14 at 11:18
  • why do you need a pandas function anyway? – mkln Jan 28 '14 at 14:09
  • 1
    I feel like Pandas is probably able to do this. I want to learn. – Zhubarb Jan 28 '14 at 14:38
  • 1
    I think this SO answers your question. http://stackoverflow.com/questions/18578686/pandas-join-with-outer-product – PabTorre Sep 30 '15 at 02:19

1 Answers1

1

A Complete solution using only Pandas built-in methods:

def outer_product(row):
    numerator = df.sum(1).mul(row.sum(0))
    denominator = df.sum(0).sum(0)
    return (numerator.floordiv(denominator))

df.apply(outer_product)

Image

Timings: For 1 million rows of DF.

enter image description here

Nickil Maveli
  • 29,155
  • 8
  • 82
  • 85