0

I used HaplotypeCaller for variant calling out of WES picard.sorted.MarkedDup.bam file with GATK 4.2.6.1. HaplotypeCaller standard command line.

Apparently, everything worked well and I received standard .vcf file. But the number of identified variants are too much for WES result. It's close to one million variants for one sample! Did I perform something wrong? What solution do you recommend? Any help would be appreciated.

The command line I used was as follow:

gatk --java-options -Xmx8g HaplotypeCaller \ -R $refFile \ -I ${base}.picard.sorted.markedDup.bam \ --dont-use-soft-clipped-bases -stand-call-conf 20.0 \ --emit-ref-confidence GVCF \ -O ${base}.rrrrealigned.vcf

Alireza
  • 3
  • 2
  • Alright, it's been a while I asked that question and now I got the answer. Just in case if anybody else is interested in: we can try filtering the VCF for genotype quality (GQ) and/or depth (DP) by various tools. I used VCFtools and the command line is as follow: 'vcftools --vcf input.vcf --max-missing 0.9 --minGQ 30 --minDP 20 --recode --out output.filtered.vcf' . Complete explanation could be found in here: https://speciationgenomics.github.io/filtering_vcfs/ – Alireza Jan 27 '23 at 11:06

0 Answers0