0

I have genotypes of over 20k individuals in a vcf file got after imputation. I'll give you an example of the aspect of this vcf file, with only 7 samples:

#CHROM   POS       ID            REF   ALT    QUAL    FILTER     FORMAT      INFO    0_0_473294.CEL      0_0_347293_v2.CEL       0_0_9588393_RS.CEL        0_0_999444_rp.CEL       0_0_26:9494949.CEL     0_0_237485_RS_rp.CEL    0_0_27:484848.CEL
16       11781     rs549521730    G     C       .       PASS    IMPUTED       GP                  

So, starting from column 10, genotypes of individuals start. Now, I need to modify individual code of this vcf file, so as to have a vcf file with the following aspect:

#CHROM   POS       ID            REF   ALT    QUAL    FILTER     FORMAT      INFO    473294     347293       9588393        999444       9494949     237485     484848
 16     11781     rs549521730    G     C       .       PASS    IMPUTED       GP                  

Therefore, I need only serial numbers, without the flanking stuff, like .CEL, _RS, 26:, and so on.

Do you know a tool, like bcftools, being able to re-annotate sample codes of a vcf file? Or is it possible to do it in bash? Thank you!

Khaleesi95
  • 89
  • 5

2 Answers2

1

If you are not comfortable with unix commands, I'll recommend you to use bcftools reheader (to modify the header of a vcf). To change sample names, the command line is:

bcftools reheader --samples <new names file> -o <output> <input>
ekerde
  • 46
  • 3
0

If I'm reading your question correctly it looks like you just want to change the column names?

It looks like there are a lot of different formats to the column sample names; How you go about converting those to just the number you want will depend on the specifics but will probably involve regex. I'm not sure your example has enough info to answer that part.

I'd recommend something like making a single-line header text file (header.txt), making a new vcf file from it (output.vcf), and appending all but the header line of the input vcf file (input.vcf) to the new file.

cp header.txt output.vcf
tail -n +2 input.vcf >> output.vcf
rndy
  • 1