Questions tagged [chi-squared]

Anything related to chi-squared probability distribution or chi-squared statistical test (typically of distribution, independence, or goodness of fit).

In probability theory and statistics, the chi-squared (X²) distribution with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. It is one of the most widely used probability distributions in inferential statistics (for example, in hypothesis testing or in construction of confidence intervals).

See also on Wikipedia:

Tag usage

Questions on tag should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

643 questions
6
votes
1 answer

How do we pass two datasets in scipy.stats.anderson_ksamp?Can anyone explain with an example?

The anderson function asks only for one parameter and that should be 1-d array. So I am wondering how to pass two different arrays to be compared in it? Thanks
icm
  • 274
  • 1
  • 5
  • 11
6
votes
2 answers

Test statistic (e.g. chisquare test) inside latex table using the tables-package in R/Knitr/Rstudio

I would like to use the tabular()-function from the tables-package to do a cross-tabulation of two variables (e.g. v1 and v2), and present the p-value of the chisq-test in the table. It is easy to get the crosstabulation, but I cant get the p-value…
Rasmus Larsen
  • 5,721
  • 8
  • 47
  • 79
5
votes
1 answer

understanding scipy.stats.chisquare

Can someone help me with scipy.stats.chisquare? I do not have a statistical / mathematical background, and I am learning scipy.stats.chisquare with this data set from https://en.wikipedia.org/wiki/Chi-squared_test The Wikipedia article gives the…
Christopher
  • 427
  • 1
  • 8
  • 18
5
votes
2 answers

Chi-square test P-value from resampled method vs scipy.stats.chi2_contigency

This question references to book "O'Relly Practical Statistics for Data Scientists 2nd Edition" chapter 3, session Chi-Square Test. The book provides an example of one Chi-square test case, where it assumes a website with three different headlines…
user97662
  • 942
  • 1
  • 10
  • 29
5
votes
1 answer

Problem understanding chi-squared feature selection

I've been having a problem understanding chi-squared feature selection. I have two classes, positive and negative, each containing different terms and term counts. I need to perform chi-squared feature selection to extract the most representative…
5
votes
4 answers

Using chi2 test for feature selection with continuous features (Scikit Learn)

I am trying to predict a binary (categorical) target from many continuous features, and would like to narrow your feature space before heading into model fitting. I noticed that the SelectKBest class from SKLearn's Feature Selection package has the…
vanchman
  • 51
  • 1
  • 2
5
votes
1 answer

Python: Chi Squared for categorical values in large dataset

I no experience of note with Python, and am trying to use it for a statistical analysis of a very large dataset (10 million cases) because the other options (SPSS and R) are unable to handle the dataset on the authorized hardware. In this dataset,…
RROBINSON
  • 191
  • 1
  • 2
  • 11
5
votes
1 answer

Calculating miniscule numbers for chi-squared distribution -- numerical precision

I am using the pchisq function in R to calculate the cumulative distribution function for the chi-squared distribution. I would like to calculate very small values, such that 1-pchisq(...) can have a value smaller than 2.2e-16 (which is the…
Vance
  • 127
  • 6
5
votes
1 answer

Chi-square p value matrix in r

Is there any way to find the chi-square p-value matrix in 'R' (a matrix with the p-values between the attributes)? As an example, consider the the iris data set. I am looking for a matrix as follows: | | Sepal length | Sepal width |…
5
votes
1 answer

Can someone tell me why R is not using the whole data.frame for this chisq.test?

I can't come up with a solution to a problem I've had when trying to create my own data.frame and run a quantitative analysis (such as a chisq.test) on it. The backdrop is as follows: I've summarized data I received relating to two hospitals. Both…
OFish
  • 474
  • 1
  • 9
  • 19
5
votes
1 answer

Calculate Chi-square with NA values

I want to perform a chi-squared test between two values with missing data. How can I do this? I've looked this up several times and across different sources, none of which had been successful.
user2105555
5
votes
1 answer

Can we generate contingency table for chisquare test using python?

I am using scipy.stats.chi2_contingency method to get chi square statistics. We need to pass frequency table i.e. contingency table as parameter. But I have a feature vector and want to automatically generate the frequency table. Do we have any such…
icm
  • 274
  • 1
  • 5
  • 11
5
votes
2 answers

Automate Chi-square across categories and columns

I have a survey dataframe containing several questions (columns) coded as 1=agree/0=disagree. Respondents (rows) are categorized according to metrics "age" ("young","middle","old"), "region" ("East","Mid","West"), etc. There are around 30 categories…
Graham Jones
  • 251
  • 2
  • 9
5
votes
1 answer

Python - Minimizing Chi-squared

I have been trying to fit a linear model to a set of stress/strain data by minimizing chi-squared. Unfortunately using the code below is not correctly minimizing the chisqfunc function. It is finding the minimum at the initial conditions, x0, which…
Will282
  • 53
  • 1
  • 1
  • 5
5
votes
1 answer

Python scipy chisquare returns different values than R chisquare

I am trying to use scipy.stats.chisquare. I have built a toy example: In [1]: import scipy.stats as sps In [2]: import numpy as np In [3]: sps.chisquare(np.array([38,27,23,17,11,4]), np.array([98, 100, 80, 85,60,23])) Out[11]: (240.74951271813072,…
gc5
  • 9,468
  • 24
  • 90
  • 151
1
2
3
42 43