Questions tagged [chi-squared]

Anything related to chi-squared probability distribution or chi-squared statistical test (typically of distribution, independence, or goodness of fit).

In probability theory and statistics, the chi-squared (X²) distribution with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. It is one of the most widely used probability distributions in inferential statistics (for example, in hypothesis testing or in construction of confidence intervals).

See also on Wikipedia:

Tag usage

Questions on tag should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

643 questions
4
votes
1 answer

Pandas: outer product of row and col sums

In Pandas, I am trying to manually code a chi-square test. I am comparing row 0 with row 1 in the dataframe below. data 2 3 5 10 30 0 3 0 6 5 0 1 33324 15833 58305 54402 38920 For this, I need…
Zhubarb
  • 11,432
  • 18
  • 75
  • 114
4
votes
2 answers

Working between Java and R

I am trying to pass a double array into R, sum its values, and return it to Java. Here is what I am trying to do in Java: import org.rosuda.JRI.REXP; import org.rosuda.JRI.Rengine; // Start R session. Rengine re = new Rengine (new String []…
user1830307
4
votes
3 answers

Mutual Information and Chi Square relationship

I've used the following code to compute the Mutual Information and Chi Square values for feature selection in Sentiment Analysis. MI = (N11/N)*math.log((N*N11)/((N11+N10)*(N11+N01)),2) + (N01/N)*math.log((N*N01)/((N01+N00)*(N11+N01)),2) +…
4
votes
3 answers

Distribution of bytes within jpeg files

when observing compressed data, I expect an almost uniformely distributed byte stream. When using the chi square test for measure the distribution, I get this result e.g. for ZIP-files and other compressed data, but not for JPG-files. Last days I…
3
votes
1 answer

How to run a chisq.test() with this data?

I have these data: > dput(df) structure(list(Freq = c(41L, 31L, 11L, 0L), group = structure(c(1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor"), Survived = structure(c(2L, 1L, 2L, 1L), .Label = c("No", "Yes"), class = "factor")), row.names…
Ben
  • 1,432
  • 4
  • 20
  • 43
3
votes
3 answers

Generate random numbers of a chi squared distribution in R

I want to generate a chi squared distribution with 100,000 random numbers with degrees of freedom 3. This is what I have tried. df3=data.frame(X=dchisq(1:100000, df=3)) But output is not I have expected. I used below code to visualize…
user11607046
  • 177
  • 4
  • 10
3
votes
3 answers

Running multiple chi-squared tests for different categories

I have binary data depending on whether an individual pass/failed a test, as well as characteristic information (e.g. gender) and what department they belonged to (e.g. x,y,z) in df(data) head(data,9) department gender pass x Male …
3
votes
1 answer

Formatting data for a chi square test in R

I am trying to reformat my data to run a chi square test in r. My data is set up with my independent variable in one column and the counts of my independent variable groups in two other columns. I made an example of my data format here. > example <-…
Emma Beck
  • 31
  • 4
3
votes
1 answer

How to identify error with scipy.stats.chisquare returns negative values?

I am using spyder 3.1.3 with python 3.6.8 under window 10, having scipy 1.2.1. I want to get the chisquare value but notice there is negative values returned. Why is that? from scipy.stats import chisquare chisquare(f_obs=[2,1],…
lsamarahan
  • 139
  • 1
  • 10
3
votes
1 answer

Chi square test error "Chi-squared approximation may be incorrect"

I ran a chi-squared test in R and the results are: crianza = matrix(c(1,1,0,12,12,7,2,1,0,0,1,0,0,0,5, 0,0,0,1,1,2,0,0,3,0,0,0,13,35,29,0,0,1,10, 0,0,1,0,0,0,0,0),ncol=3,byrow=TRUE) colnames (crianza) =…
JSalazar
  • 33
  • 1
  • 1
  • 4
3
votes
1 answer

how to understand the chi square contingency table

I have few categorical features: ['Gender', 'Married', 'Dependents', 'Education', 'Self_Employed', 'Property_Area'] from scipy.stats import chi2_contingency chi2, p, dof, expected = chi2_contingency((pd.crosstab(df.Gender,…
Jeeth
  • 2,226
  • 5
  • 24
  • 60
3
votes
1 answer

Having trouble converting r chisquare goodness of fit test code to python equivalent

UCLA has this great site for statistical tests https://stats.idre.ucla.edu/r/whatstat/what-statistical-analysis-should-i-usestatistical-analyses-using-r/#1sampt but the code is all in R. I am trying to convert the code to Python equivalents but it…
3
votes
1 answer

R's qchisq in Python with the log.p argument?

Non-statistician here trying to replicate some code in Python. R has the function qchisq. qchisq(c(0.5, 0.1, 0.99), df=1, lower.tail=F) # [1] 0.4549364231 2.7055434541 0.0001570879 It can be replicated in Python like so: from scipy.special import…
The Unfun Cat
  • 29,987
  • 31
  • 114
  • 156
3
votes
2 answers

add p-values of Pearson's chi-squared test to facet ggplots

I compare categorical data from three different groups. I wonder if it is possible to easily add p-values of chi-squared tests to facet ggplots (since I am analyzing a big data set). I just read that there is a marvelous way to do so when comparing…
captcoma
  • 1,768
  • 13
  • 29
3
votes
1 answer

Estimate the needed sample size for a Chi Squared test

I want to estimate the needed sample size to compute a Chi Squared (Test for homogenity) test for discrete data using Python and need a hint how to do it. In general I want to estimate if the failure rates of two production processes differ…
2Obe
  • 3,570
  • 6
  • 30
  • 54