0

Currently I am working on Hypothesis Testing on datasets.

While reading about chi square tests I found this notebook through Kaggle:

https://github.com/viswanathanc/statistics/blob/master/Titanic%20Chi%20Square%20test%20-%20PClass%20vs%20Survied.ipynb

It is chi square hypothesis testing on titanic dataset.

For calculating relationship between class and survival he used this code:

1) For getting contingency table (observed values)

PClass_survd = pd.pivot_table(data,index=['Pclass'],columns=['Survived'],aggfunc='size')

2) How class and survival is distributed

pct_class = PClass_survd.sum(axis=1)/891

pct_survived = PClass_survd.sum(axis=0)/891

3) To Calculate Expected Values

pct_class.to_frame()@(pct_survived.to_frame().T)

I don't understand How expected values are calculated in step 3. I know pd.to_frame() convert series to dataframe.

Can anyone please explain this step 3 in detail or how generally expected values be calculated from dataset without using chi square function from stats (with example if possible) ?

Thanks in advance

ApaarBawa
  • 65
  • 6
  • It seems like you really have a math question, rather than a programming question. Have you tried math.stackexchange.com? Do you understand what the code means, in conventional write-it-out-in-your-math-homework-binder terms? In particular, do you need it explained what the `@` symbol does in this context, or the `.T`? *What exactly is the question*? – Karl Knechtel Oct 09 '20 at 07:43
  • Yes both `.T` and `@`. Also How it is used for calculating Expected values. Yes It is a maths question linked with hypothesis testing. – ApaarBawa Oct 09 '20 at 07:59
  • Okay. Are you familiar with pandas and numpy? Did you try reading the documentation? – Karl Knechtel Oct 11 '20 at 03:31

0 Answers0