Post-Hoc tests for chi-sq in R

Question

I have a table that looks like this.

> dput(theft_loc)
structure(c(13704L, 14059L, 14263L, 14450L, 14057L, 15503L, 14230L, 
16758L, 15289L, 15499L, 16066L, 15905L, 18531L, 19217L, 12410L, 
13398L, 13308L, 13455L, 13083L, 14111L, 13068L, 19569L, 18771L, 
19626L, 20290L, 19816L, 20923L, 20466L, 20517L, 19377L, 20035L, 
20504L, 20393L, 22409L, 22289L, 7997L, 8106L, 7971L, 8437L, 8246L, 
9090L, 8363L, 7934L, 7874L, 7909L, 8150L, 8191L, 8746L, 8277L, 
27194L, 25220L, 26034L, 27080L, 27334L, 30819L, 30633L, 10452L, 
10848L, 11301L, 11494L, 11265L, 11985L, 11038L, 12104L, 13368L, 
14594L, 14702L, 13891L, 12891L, 12939L), .Dim = c(7L, 10L), .Dimnames = structure(list(
    c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", 
    "Friday", "Saturday"), c("BAYVIEW", "CENTRAL", "INGLESIDE", 
    "MISSION", "NORTHERN", "PARK", "RICHMOND", "SOUTHERN", "TARAVAL", 
    "TENDERLOIN")), .Names = c("", "")), class = "table")

I ran a chisq.test and the results came back significant. I now want to run some pairwise tests to see where the significance lies. I tried to use the fifer package and the chisq.post.test function but i get an error that says out of workspace.

What other ways can I run the multiple comparisons test?

score 6 · Accepted Answer · answered Feb 07 '17 at 19:12

6

This will work (try chisq.test instead of the default fisher.test (exact) in post hoc test):

(Xsq <- chisq.test(theft_loc))  # Prints test summary, p-value very small,
#       Pearson's Chi-squared test
# data:  theft_loc
# X-squared = 1580.1, df = 54, p-value < 2.2e-16 # reject null hypothesis for independence

library(fifer)
chisq.post.hoc(theft_loc, test='chisq.test')

with output

 Adjusted p-values used the fdr method.

               comparison  raw.p  adj.p
1       Sunday vs. Monday 0.0000 0.0000
2      Sunday vs. Tuesday 0.0000 0.0000
3    Sunday vs. Wednesday 0.0000 0.0000
4     Sunday vs. Thursday 0.0000 0.0000
5       Sunday vs. Friday 0.0000 0.0000
6     Sunday vs. Saturday 0.0000 0.0000
7      Monday vs. Tuesday 0.0000 0.0000
8    Monday vs. Wednesday 0.0000 0.0000
9     Monday vs. Thursday 0.0000 0.0000
10      Monday vs. Friday 0.0000 0.0000
11    Monday vs. Saturday 0.0000 0.0000
12  Tuesday vs. Wednesday 0.1451 0.1451
13   Tuesday vs. Thursday 0.0000 0.0000
14     Tuesday vs. Friday 0.0000 0.0000
15   Tuesday vs. Saturday 0.0000 0.0000
16 Wednesday vs. Thursday 0.0016 0.0017
17   Wednesday vs. Friday 0.0000 0.0000
18 Wednesday vs. Saturday 0.0000 0.0000
19    Thursday vs. Friday 0.0000 0.0000
20  Thursday vs. Saturday 0.0000 0.0000
21    Friday vs. Saturday 0.0000 0.0000

As we can see, all the pairwise tests except a couple are significant, we can use different p-value-correction too (by changing the control from default fdr to bonferroni).

answered Feb 07 '17 at 19:12

Sandipan Dey

21,482
2
51
63

1

Perfect!! No wonder why it wasn't working. Is there a way to compare days of the week vs pdDistrict? My main question is trying to understand where the differences lie in the data. Is there a day and a location that has higher crime. – Ted Mosby Feb 07 '17 at 19:18
Well, with the post hoc test I think we can conclude which pairs are significant, in order to measure which day / location has higher crime we need a different test. – Sandipan Dey Feb 07 '17 at 19:32
Do you just want to visulaize the day and location that has higher crime? Then this may work : 1) df<-tbl_df(theft_loc) 2) colnames(df)<-c("day","place","n") 3)ggplot(df,aes(x=place,y=n))+geom_bar(stat="identity")+facet_grid(.~day)+theme(axis.text.x=element_text(angle=90)) – thisisrg Feb 07 '17 at 19:45
@thisisrg I think OP statistically wants to conclude higher crime for that he probably needs to do some kind of `hypothesis test` such as `binomial test`. – Sandipan Dey Feb 07 '17 at 19:47
2

I see... so maybe 1) aov(n ~ place+day, data = df)->fit , 2) TukeyHSD(fit) using the same df as the above comment. – thisisrg Feb 07 '17 at 19:53
1

just realized an underused ggplot facet trick. You can see the crimes for a day/place combo like so: ggplot(df,aes(x=place,y=n))+geom_bar(stat="identity")+facet_grid(.~day+place,scale="free")+theme(axis.text.x=element_text(angle=90))+theme(strip.text.x=element_text(angle=90)). Southern has high crime for all days. – thisisrg Feb 07 '17 at 20:00
2

Would it make more sense to do a GLM with a poisson distribution to determine where the higher frequencies occur? `glm(freq~PdDistrict+DayOfWeek,df,family ='poisson')` – Ted Mosby Feb 07 '17 at 21:09
don't see why not. For a more tidy output: 1) library(broom) 2) glm(freq~PdDistrict+DayOfWeek,df,family ='poisson')->glmfit 3)tidy(glmfit). Finally I think this is what you wanted. Solved? :). – thisisrg Feb 07 '17 at 23:08
This answer doesn't work for R 3.5.0 "Warning in install.packages : package ‘fifer’ is not available (for R version 3.5.0)" – Nakx Feb 02 '19 at 07:19
@Nakx you can install fifer with the source compressed file from here https://cran.r-project.org/src/contrib/Archive/fifer/ – Sandipan Dey Feb 02 '19 at 08:23
it is a shame that fifer is no longer maintained, but if you want to install install.packages("https://cran.r-project.org/src/contrib/Archive/fifer/fifer_1.1.tar.gz",repo=NULL,type="source") – Dimitrios Zacharatos Feb 24 '20 at 10:08

score 5 · Answer 2 · answered Feb 02 '19 at 07:59

Since fifer is no longer maintained here is a solution with RVAideMemoire (described in more detail here https://rdrr.io/cran/RVAideMemoire/src/R/chisq.multcomp.R):

install.packages("RVAideMemoire")
library(RVAideMemoire)
chisq.multcomp(theft_loc, p.method = "none")
>      7874    7909    7934    7971    7997    8106    8150    8191    8246    8277    8363    8437    8746    9090    10452   10848   11038   11265   11301  
7909  0.78056 -       -       -       -       -       -       -       -       -       -       -       -       -       -       -       -       -       -      
7934  0.63321 0.84256 -       -       -       -       -       -       -       -       -       -       -       -       -       -       -       -       -      
7971  0.44095 0.62272 0.76923 -       -       -       -       -       -       -       -       -       -       -       -       -       -       -       -      
7997  0.32889 0.48533 0.61768 0.83698 -       -       -       -       -       -       -       -       -       -       -       -       -       -       -      
8106  0.06647 0.11954 0.17444 0.28701 0.39036 -       -       -       -       -       -       -       -       -       -       -       -       -       -      
8150  0.02923 0.05720 0.08854 0.15860 0.22857 0.73002 -       -       -       -       -       -       -       -       -       -       -       -       -      
8191  0.01238 0.02625 0.04298 0.08354 0.12732 0.50552 0.74841 -       -       -       -       -       -       -       -       -       -       -       -      
8246  0.00339 0.00802 0.01417 0.03081 0.05073 0.27360 0.45342 0.66793 -       -       -       -       -       -       -       -       -       -       -      
8277  0.00152 0.00382 0.00706 0.01637 0.02817 0.18156 0.32174 0.50276 0.80943 -       -       -       -       -       -       -       -       -       -      
8363  0.00012 0.00037 0.00078 0.00216 0.00422 0.04522 0.09741 0.18128 0.36396 0.50497 -       -       -       -       -       -       -       -       -      
8437  1.0e-05 3.6e-05 8.5e-05 0.00027 0.00060 0.01007 0.02585 0.05643 0.13921 0.21586 0.56805 -       -       -       -       -       -       -       -      
8746  1.3e-11 8.8e-11 3.2e-10 2.0e-09 7.1e-09 8.2e-07 4.5e-06 2.0e-05 0.00013 0.00032 0.00341 0.01841 -       -       -       -       -       -       -      
9090  < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 6.2e-14 8.1e-13 8.0e-12 1.5e-10 6.9e-10 3.7e-08 8.1e-07 0.01000 -       -       -       -       -       -      
10452 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 < 2e-16 -       -       -       -       -

Category are replaced by the counts of each category. I don't like corrections for multiple comparisons (see ref below for a discussion), but fdr is available.

Moran, M. D. (2003). Arguments for rejecting the sequential Bonferroni in ecological studies. Oikos.

again RVAideMemoire depends on mixOmics and this is not available in R version 3.6.2 I am using. You have to use install packages("package_url",repo=NULL,type="source") — Dimitrios Zacharatos, Feb 24 '20 at 10:12
thanks, I don't know why such a basic thing isn't implemented elsewhere, maybe it is trivial to write it. — Nakx, Feb 24 '20 at 10:41

Post-Hoc tests for chi-sq in R

2 Answers2

Linked