0

I'm working with GWAS data.

Using p-link command I was able to get SNPslist, SNPs.map, SNPs.ped.

Here are the data files and commands I have for 2 SNPs (rs6923761, rs7903146):

$ cat SNPs.map 
0   rs6923761   0   0
0   rs7903146   0   0

$ cat SNPs.ped
6 6 0 0 2 2 G G C C
74 74 0 0 2 2 A G T C
421 421 0 0 2 2 A G T C
350 350 0 0 2 2 G G T T
302 302 0 0 2 2 G G C C

bash commands I used:

echo -n IID > SNPs.csv
cat SNPs.map | awk '{printf ",%s", $2}' >> SNPs.csv
echo >> SNPs.csv
cat SNPs.ped | awk '{printf "%s,%s%s,%s%s\n", $1, $7, $8, $9, $10}' >> SNPs.csv
cat SNPs.csv

Output:

IID,rs6923761,rs7903146
6,GG,CC
74,AG,TC
421,AG,TC
350,GG,TT
302,GG,CC

This is about 2 SNPs, so I can see manually their position so I added and called using the above command. But now I have 2000 SNPs IDs and their values. Need help with bash command which can parse over 2000 SNPs in the same way.

markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • assuming the sample `SNPs.map` is what's in the `Camilleri.../...SNPs.map` file ... `cat Camilleri-SNPs/Camilleri-SNPs.map | awk '{printf ",%s", $2}'` generates `,0,0` (not `,rs6923761,rs7903146`); please update the question with the contents of `Camilleri-SNPs/Camilleri-SNPs.map`; also, what is the format of the `*.ped` file (alternatlivey, provide a sample from another `*.ped` file for 4x SNPs) – markp-fuso Apr 20 '22 at 14:40
  • I have updated the file names, please check –  Apr 20 '22 at 14:55
  • I have tried this bash command –  Apr 20 '22 at 14:56
  • Need help with bash command which add every two alternative columns 'cat SNPs.ped | awk '{printf "%s,%s%s,%s%s\n", $1, $7, $8, $9, $10}' >> SNPs.csv` like here above : "%s" extract $1 first column of ped file and then "%s%s" this adding two column values {$7,$8}, then {$9,$10}. Similarly I need to concat $1, {$7,$8}, {$9,$10}, {$11,$12}.....{1999,2000} –  Apr 20 '22 at 15:05
  • `cat SNPs.map | awk '{printf ",%s", $2}'` still generates `,0,0` so not sure how you were able to generate `,rs6923761,rs7903146` with the provided `SNPS.map` file – markp-fuso Apr 20 '22 at 15:09
  • In the map file, the second column is the SNP id column. I'm not sure how it is extracting but that command working fine, can you help with command which add every two columns alternatively –  Apr 20 '22 at 15:14
  • please update the question with your additional details; formatting is lost in comments and in this case formatting is key to understanding how to parse the file(s) – markp-fuso Apr 20 '22 at 15:17
  • you've now changed the format and content of `SNPs.map` ... completely different from before ... so ***NOW*** your code generates `,rs6923761,rs7903146` – markp-fuso Apr 20 '22 at 15:20
  • The map file is like a space-delimited file, please check I have updated the details –  Apr 20 '22 at 15:21
  • Ya like that I have 2k SNPs i,e rs6923761, rs7903146, kgp22785968, kgp22786002..... Is there any way i can slicing or for loop which add every two columns alternatively starting from {$7&$8}, {9$&$10}.....till end –  Apr 20 '22 at 15:25

2 Answers2

0

One awk idea that replaces all of the current code:

awk '
BEGIN   { printf "IID" }

# process 1st file:

FNR==NR { printf ",%s", $2; next }

# process 2nd file:

FNR==1  { print "" }                       # terminate 1st line of output
        { printf $1                        # print 1st column
          for (i=7;i<=NF;i=i+2)            # loop through columns 7-NF, incrementing index +2 on each pass
              printf ",%s%s", $i, $(i+1)   # print (i)th and (i+1)th columns
          print ""                         # terminate line
        }
' SNPs.map SNPs.ped

NOTE: remove comments to declutter code

This generates:

IID,rs6923761,rs7903146
6,GG,CC
74,AG,TC
421,AG,TC
350,GG,TT
302,GG,CC
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
0

You can use --recodeA flag in plink to have your IID as rows and SNPs as columns.

DSTO
  • 265
  • 1
  • 9