I have data that looks like this:
SNP aa_controls aA_controls AA_controls aa_cases aA_cases AA_cases
rs2378938 3412 16822 21987 2635 13197 16573
rs6712069 87 3354 38780 58 2659 29688
rs62445806 2306 15116 24799 1781 11497 19127
This continues for ~14k SNPS
I want to test if one or both of the two alleles are associated with higher risk of a disease for each SNP. So logically, I thought to first create a contingency table for each SNP that would look like this:
aa Aa AA
case # # #
control # # #
So i can perform a chi-squared test for each SNP. However, I am having trouble re-configuring the data so a contingency table can be made. and then, after that apply a chi-squared to each table and store each p-value in a string or vector