I have a data frame similar to the example below (which is a small extract of my actual data frame).
frequencies <- data.frame(sex=c("female", "female", "male", "male", "female", "female", "male", "male", "female", "female", "male", "male", "female", "female", "male", "male"),
ecotype=c("Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave", "Crab", "Wave"),
contig_ID=c("Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367", "Contig100169_2367",
"Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481", "Contig100169_2481"),
allele=c("p", "p", "p", "p", "q", "q", "q", "q", "p", "p", "p", "p", "q", "q", "q", "q"),
frequency=c(157, 98, 140, 65, 29, 8, 26, 9, 182, 108, 147, 80, 46, 4, 49, 4))
I would like to do separate chi-square contingency tests for each combination of ‘contig_ID’ and ‘ecotype’, testing the association between ‘sex’ and ‘allele’. I would then like to summarise the results of these in a table that includes the p value for each combination of ‘contig_ID’ and ‘ecotype’. For instance, from the example table given, I would expect a results table of 4 p values like the example below.
results <- data.frame(ecotype=c("Crab", "Wave", "Crab", "Wave"),
contig_ID=c("Contig100169_2367", "Contig100169_2367", "Contig100169_2481", "Contig100169_2481"),
pvalue=c("pval", "pval", "pval", "pval"))
Alternatively, just adding a p value column to the original table would also work, with the p value for each combination just repeated in all the relevant rows.
I have been attempting to use functions such as lapply()
and summarise()
in combination with chisq.test()
to achieve this but have had no luck so far. I have also attempted to use a method similar to this: R chi squared test (3x2 contingency table) for each row in a table , but couldn't make this work either.