-3

I have some trouble using the chisq.test command in R : I got different and weird results according to how I use the data.

Let's say I have the following table named t:

> t    
   data1   data2   data3   data4   data5
    1487    3301    2983    2432    6151
    1296    1519    1354    1244    3139
    1169     867     837     916    2191
    1372     681     802    1065    1749
    1497     630     962    1256    1304
    1502     544    1097    1380     942
    1344     477    1200    1410     673
    1031     346    1199    1286     347
     705     172     975     980     170
     542      90     919     770      66
     276      26    1005     604      10

I'm doing chi2 tests between columns but I don't understand :

When I do chisq.test(x=t[,1], y=t[,2]), I got :

X-squared = 110, df = 100, p-value = 0.2322

which is the same result than when I do :

data1 <- c(1487, 1296, 1169, 1372, 1497, 1502, 1344, 1031, 705, 542, 276)
data2 <- c(3301, 1519, 867, 681, 630, 544, 477, 346, 172, 90, 26)
chisq.test(x=data1, y=data2)

But is different than :

t2 <- matrix(c(data1, data2), ncol=11, nrow=2, byrow=T)
chisq.test(t2)
X-squared = 2865.8, df = 10, p-value < 2.2e-16

According to the degrees of freedom, I guess the last one is correct,but what is happening here ? Moreover, I got the same pvalues whatever the columns I choose to use in the test ...

Micawber
  • 707
  • 1
  • 5
  • 19

1 Answers1

2

Actually, with your third chisqtest you are putting data1 and data2 in one vector, and you are comparing that vector of length 22 to y = NULL. To be exactly, your doing the following with your latter chisq.test command:

 t2 <- matrix(c(data1, data2), ncol=11, nrow=2, byrow=T)
 chisq.test(x = t2, y = NULL)

Which gives:

 Pearson's Chi-squared test
 data:  t2
 X-squared = 2865.8, df = 10, p-value < 2.2e-16
Lennyy
  • 5,932
  • 2
  • 10
  • 23
  • Allright but then, if I want to test if the two data series are significantly different, what is the correct way to do ? I'm kind of lost when it comes to statistics (even basics :/) – Micawber May 07 '18 at 12:38
  • If you want to use a chisq.test, your third approach was wrong for above reasoning. Whether you actually need to do a chisq,test anyway depends on some other aspects like the context of your data as well, but with a comment of max 500 characters it is hard to give you that much guidance there. Also, a lot a has been written about statistics throughout the internet and in books already. At a glimpse I'd suggest you could dive into the characteristics of t-tests. – Lennyy May 07 '18 at 12:48
  • Is a wilcoxon rank sum test suitable then ? – Micawber May 07 '18 at 12:54
  • @Micawber, your questions seem to be more appropriate in http://stats.stackexchange.com/ – hpesoj626 May 07 '18 at 13:03