What does the p value represent in the scipy.stats chisquare function?

Question

I am using the scipy.stats.chisquare function, as explained on https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html

I understand that, in general, the p-value demonstrates how well the data supports the null hypothesis. A high p-value, usually greater than 0.05, suggests that the null hypothesis is true, and vice versa.

However, the p-value of the scipy.stats chisquare function gives me high p-values when my data is similar to the expected values, and low p-values when my data is different from the expected values.

I am expecting the p-value to be small when my data closely resembles the expected values as this would indicate that the null hypothesis is false. The null hypothesis being that my data does not resemble the expected values.

What does the p-value represent in this function?

A high p-value does not suggest that the null hypothesis is true. Rather, a high p-value means that our evidence is not inconsistent with the null hypothesis being true. — Duncan MacIntyre, Jul 08 '21 at 04:20

score 3 · Accepted Answer · answered Apr 09 '17 at 05:26

3

The answer is right in the documentation you linked to:

The chi square test tests the null hypothesis that the categorical data has the given frequencies.

In other words, the null hypothesis is the opposite of what you thought.

answered Apr 09 '17 at 05:26

BrenBarn

242,874
37
412
384

What does the p value represent in the scipy.stats chisquare function?

1 Answers1