0

I have a multi-sample vcf file and I want to get a table of IDs on the left column with the variants in which they have an alternate allele in. It should look like this:

ID1 chr2:87432:A:T_0/1 chr10:43234:C:G_1/1
ID2 chr2:87432_A:T_1/1 
ID3 chr11:432434:T:G chr14:34234234:C:G chr20:34324234:T:C

This is to then read into R

I have tried combinations of:

bcftools query -f '[%SAMPLE\t] %CHROM:%POS:%REF:%ALT[%GT]\n' but I keep getting sample IDs overlapping on the same line and I can't quite figure out the sytnax.

Your help would be much appreciated

tacrolimus
  • 500
  • 2
  • 12

1 Answers1

1

You cannot achieve what you want with a single BCFtools command. BCFtools parses one VCF variant at a time. However, you can use a command like this to extract what you want:

bcftools +split -i 'GT="0/1" | GT="1/1"' -Ob -o DIR input.vcf

This will create one small .bcf file for each sample and you can then run multiple instance of bcftools query to get what you want

Giulio Genovese
  • 2,761
  • 1
  • 15
  • 12