Convert space separated text file into named vectors to calculate HWE

Question

I'm dealing with text files and vectors.

I have a space separated text file with the following format:

id1 AA 44 AG 20 GG 36
id2 CC 30 CT 22 TT 48
id3 CT 60 CC 30 TT 10
...

And I need a code that loops through each line and put the id in a variable and the rest of the values in a vector. Example of a vector corresponding to the first line:

x <- id1
y <- c(AA=40,AG=20,GG=36)

Edit: I need to use HWChisq function from HardyWeinberg package to exclude SNPs that have p-value < 0.001. Function requires named vector of counts for each allele.

Looks like genotypes data, and numbers are percentages of alleles (sum is 100)? Could you tell us why do you need this data in that format, what is the next step? — zx8754, Mar 21 '20 at 20:41
@zx8754 Yes you are right! But the numbers are the count of alleles not the percentage. I need to use HWChisq function from HardyWeinberg package to exclude SNPs that have p-value < 0.001 — NST, Mar 21 '20 at 20:44
@NST if you can show a `dput` of modified example, it would be great — akrun, Mar 21 '20 at 20:52

score 2 · Answer 1 · answered Mar 21 '20 at 20:34

If we have alternate columns (assuming we have an object created in R by reading the .csv file with read.csv/read.table), then split by row with asplit excluding the first column 'id' column, and create a named vector with setNames

lst1 <- Map(setNames, asplit(df1[-1][c(FALSE, TRUE)], 1), 
         asplit(df1[-1][c(TRUE, FALSE)], 1))
names(lst1) <- df1[[1]]
lst1$id1
# AA AG GG 
# 44 20 36

data

df1 <- structure(list(id = c("id1", "id2", "id3"), v1 = c("AA", "CC", 
"AA"), v2 = c(44L, 30L, 60L), v3 = c("AG", "CT", "AG"), v4 = c(20L, 
22L, 30L), v5 = c("GG", "TT", "GG"), v6 = c(36L, 48L, 10L)), 
class = "data.frame", row.names = c(NA, 
-3L))

Thank you akurn. The columns are not alternate, it's just an example. I will modify the question. — NST, Mar 21 '20 at 20:41

zx8754 · Accepted Answer · 2020-03-23T07:31:37.587

Loop through row by row, then apply the HWE function:

library("HardyWeinberg")

# data
df1 <- read.table(text = "
id1 AA 44 AG 20 GG 36
id2 CC 30 CT 22 TT 48
id3 CT 60 CC 30 TT 10", header = FALSE, stringsAsFactors = FALSE)

out <- apply(df1[, c(3, 5, 7)], 1, function(i){
  x <- HWChisq(setNames(i, c("AA", "AB", "BB")), verbose = FALSE)
  x$pval
})

# [1] 5.774374e-09 1.182236e-07 7.434226e-02

Pretty output:

cbind(df1, HWE = out)
#    V1 V2 V3 V4 V5 V6 V7          HWE
# 1 id1 AA 44 AG 20 GG 36 5.774374e-09
# 2 id2 CC 30 CT 22 TT 48 1.182236e-07
# 3 id3 CT 60 CC 30 TT 10 7.434226e-02

To calculate the HWE for X-chromosome see vignette:

4. X-chromosomal tests for Hardy-Weinberg equilibrium

Recently, Graffelman and Weir (2016) have proposed specific tests for HWE for bi-allelic markers on the X-chromosome. These tests take both males and females into account. The X-chromosomal tests can be carried out by the same functions mentioned in the previous Section (HWChisq, HWLratio, HWExact, HWPerm) and adding the argument x.linked=TRUE to the function call.

I have many SNPs that has two genotypes only. How should use HWChisq test for them? your help is highly appreciated. — NST, Mar 22 '20 at 13:53
@NST See the edit, it is explained in the vignette, please read the manuals. — zx8754, Mar 23 '20 at 07:32

Convert space separated text file into named vectors to calculate HWE

2 Answers2

data