Questions tagged [vcftools]

VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.

VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.

This toolset can be used to perform the following operations on VCF files:

  • Filter out specific variants
  • Compare files
  • Summarize variants
  • Convert to different file types
  • Validate and merge files
  • Create intersections and subsets of variants

Links:

  1. Home page

  2. Documentation

  3. Github

42 questions
1
vote
1 answer

Combine multiple VCF files into one large VCF file

I have a list of VCF files from specific ethnicity such as American Indian, Chinese, European, etc Under each ethnicity, I have around 100+ files. Currently, I computed the VARIANT QC metrics such as call_rate, n_het etc for one file as shown…
The Great
  • 7,215
  • 7
  • 40
  • 128
1
vote
0 answers

How to filter VCF file with a list CHR or contig IDs?

I need to subset/filter a SNP vcf file by a long list of non-sequential contig IDs, which appear in the CHR column. My VCF file contains 13,971 contigs currently, and I want to retain a specific set of 7,748 contigs and everything associated with…
acoles
  • 11
  • 3
1
vote
2 answers

vcftools - installing on MAC

I'm trying to install vcftools on mac. Looking at previous posts on this issue, I made sure I've got Mac OS X developer tools (http://www.cnet.com/how-to/install-command-line-developer-tools-in-os-x/). I followed the procedure recommended in the…
FcmC
  • 143
  • 9
1
vote
2 answers

vcf to ped format: redefine non-dbSNPs

When I am converting a vcf file to ped format (with vcftools or with vcf to ped converter of 1000G), I run into the problem that the IDs of the variants that don't have a dbSNP ID get the base pair position of that variant as an ID. Example of…
1
vote
1 answer

Preparing a Perl file to run with Ubuntu and tabix

I don't know about Ubunto or Perl but still need to install and run a program on it. This is what I am looking at: http://vcftools.sourceforge.net/docs.html On the installation section it says this: To build the vcftools executable, type "make" in…
Bohn
  • 26,091
  • 61
  • 167
  • 254
0
votes
0 answers

Merging two plink files

I have two plink binary files - one containing only polimorphic sites (400k snps), the other one - plink file with reference data containing more sites (500K). How to merge them, so that those extra 100K snps will not be assigned to missing in a…
Anna
  • 53
  • 6
0
votes
0 answers

vcf2maf - generate one maf file for two vcf files

I have 38 samples in vcf format and need to generate maf files for each to visualise them using MesKit in R. Some of the samples are matched tumour and normal and I was wondering if there is a way to generate one single maf file for the two vcf…
CH1374
  • 3
  • 2
0
votes
1 answer

Is It Possible to Calculate Allele Frequency in a VCF File with Python?

I have a VCF file with 200 samples (mitochondrial genome of Plasmodium falciparum). I managed to transform the raw data into Pandas dataframe. Here is a pic to take a look at: And a few relevant lines from the actual…
eh329
  • 94
  • 10
0
votes
0 answers

python error: Traceback (most recent call last), IndexError: list index out of range

I'm trying to run the below python script (vcf2treemix.py) with the command <./vcf2treemix.py -vcf allsamples14_filtered_1_autosomes38_bisnps.vcf.gz -pop allsamples14.clust.pop> I got this error with both python 2 and 3 ######### error…
0
votes
1 answer

Missing data per site

I want to calculate statistics of missing data per each site in my vcf file. Using vcftools --missing-site gives wrong stats for several sites. Is there is any other way to calculate it? Thank you!
Anna
  • 53
  • 6
0
votes
1 answer

Extract variant positions from VCF dependent on contents of other columns

I have a vcf file, I am trying to extract the information from these columns: #CHROM POS REF ALT However I would like to extract these only if the SAMPLE-1 column contains the string DeNovo (Not DeNovoSV) and that SAMPLE-1, SAMPLE-2, and…
hdjc90
  • 77
  • 6
0
votes
1 answer

How to run ensembl-vep in conda

I’ve installed like so: conda install ensembl-vep=105.0-0 And then installed the human cache like this: vep_install -a cf -s homo_sapiens -y GRCh38 -c /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/refs/vep —CONVERT But I can’t get it to run…
Mike
  • 921
  • 7
  • 26
0
votes
1 answer

VCF file is missing mandatory header line ("#CHROM...")

I am getting an error when I am going to read a VCF file using scikit-allel library inside a docker image and os ubuntu 18.04. It shows that raise RuntimeError('VCF file is missing mandatory header line ("#CHROM...")') RuntimeError: VCF file is…
0
votes
1 answer

creating a per sample table from a vcf using bcftools

I have a multi-sample vcf file and I want to get a table of IDs on the left column with the variants in which they have an alternate allele in. It should look like this: ID1 chr2:87432:A:T_0/1 chr10:43234:C:G_1/1 ID2 chr2:87432_A:T_1/1 ID3…
tacrolimus
  • 500
  • 2
  • 12
0
votes
2 answers

Merge three columns in one (linux, python, or perl)

I have one file (.tsv) that contain variants calling for all the samples. I would like to merge the first three columns into one column: Example: Original: file name= variants.tsv > the first three columns that I want to merge are: lane sampleID …
Alhu.A
  • 31
  • 1