0

I am trying to make a hypothesis test of two categorical variables. If I summarize the data it looks like this:

               target
               0      1
 airbag   0  11129   669
          1  13907   511

target: 0 means that the person is alive after a car accident. 1 means that the person died during the accident airbag: 0 means that there was no airbag or it did not deploy. 1 means that there was an airbag open.

Now I state my hypothesis:

H0: Airbag vs no Airbag variables are independent.

H1: Airbag vs no Airbag variables are dependent.

I tried this with the chisquared test, but I get a pvalue = 0.0 and I am not sure whether I do everything correctly. This is how my code looks like:

from scipy.stats import chisquare
chisquare([669, 511], f_exp = [11129,13907])

And this is the outcome:

Power_divergenceResult(statistic=22734.991970453277, pvalue=0.0)

Is this normal or I am doing something wrong?

Thanks in advance for any assistance!

Nick
  • 67
  • 2
  • 9
  • Question has nothing to do with `machine-learning` or `jupyter-notebook` - kindly do not spam irrelevant tags (removed & replaced with `statistics` & `scipy`). – desertnaut Oct 26 '19 at 10:32
  • 1
    Your sample size is waaaaaAAAaaay too small to do any meaningful stats. That said to answer your question, yes those results look right for the data provided – DrBwts Oct 26 '19 at 10:40
  • I am doing a machine learning project in jupyter notebook :) Thanks @DrBwts ! – Nick Oct 26 '19 at 11:15
  • *"I am trying to make a hypothesis test of two categorical variables."* What is the hypothesis that you want to test? Without knowing more about the actual question you want to answer, it is difficult for anyone to help you. If your data is a [contingency table](https://en.wikipedia.org/wiki/Contingency_table), and you want to test for association between the variables, you can use [`scipy.stats.chi2_contingency`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html). – Warren Weckesser Oct 27 '19 at 22:00
  • @DrBwts I don't see any basis whatever for the claim that sample size is too small to do any meaningful stats. This is not at all the case; there may be any number of reasons why meaningfulness of statistical analysis *might* be an issue here but sample size is not remotely an issue. Please clarify the basis of your statement. – Glen_b Oct 28 '19 at 03:18
  • @Nick You haven't stated a hypothesis to test which makes it a little difficult to give good advice. Please clarify what you're trying to find out (without using the word 'significance" or anything like it). It looks to me like what you did is probably incorrect (though it may not change the p-value much) – Glen_b Oct 28 '19 at 03:23
  • have read through properly & recind my previous comment about sample size. Sorry read it too fast & was too quick to reply – DrBwts Oct 30 '19 at 11:41
  • @Glen_b, sorry for the late response. I edited my question and added some more information and explanation. – Nick Nov 07 '19 at 19:19

0 Answers0