1

I am trying to convert a .vcf file into the correct format for BayeScan. I have tried using PGDSpider as recommended but my .vcf file is too big so I get a memory issue.

I then found a perl script on Github that may be able to convert my file even though it is really big. The script can be found here. However it does not correctly identify the number of populations I have. It only finds 1 popualtion, whereas I have 30.

The top of my population file looks like so, following the example format in the perl script.

index01_barcode_10_PA-1-WW-10     pop1 
index02_barcode_29_PA-5-Ferm-19   pop2
index01_barcode_17_PA-1-WW-17     pop1
index02_barcode_20_PA-5-Ferm-10   pop2
index03_barcode_16_PA-7-CA-14     pop3

I have also tried the script with a sorted population file. I have no experience with perl language so I am struggling to work out why the script is not working.

I think it is to do with this section of the script but cannot be sure:

# read and process pop file

while (<POP>){
        chomp $_;
        @line = split /\t/, $_;
        $pops{$line[0]} = $line[1];
}
close POP;

# Get populations and sort them

my @upops = sort { $a cmp $b } uniq ( values %pops );
print "found ", scalar @upops, " populations\n";

Appolgies as I am not sure how to make this a reproducible example but I am hoping someone could at least help me understand what this part of the code is doing and if there is a way to adapt it? Isthe problem that my individual names include _ and -?

Thank you so much for your advice and help in advance :)

Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
QPaps
  • 312
  • 1
  • 14
  • Hey @toolic thank you for your help with this. I will try make a small example. I don't get an error - the script run fine but it only finds 1 population as opposed to 30. It print out `found 1 populations` `processing SNP #" – QPaps Jun 03 '20 at 13:13
  • Yes, I made that issue but as there was no response thought it would be good to ask more kind people like yourself for assitance. Do you know of any small publicaly available vcf files I can use for an example? – QPaps Jun 03 '20 at 13:31
  • Its suddenly working now. I think the reason is how I made the populations file. I used this: `paste sample_names pops | column -s $'\t' -t > pop_file.txt` but if I simply use this: `paste sample_names pops > pop_file.txt` and ensure the full path to vcf (not path from current directory) and it works! Thank you so much for your help - using a small example made me problem solve much better :) – QPaps Jun 03 '20 at 13:57

1 Answers1

1

Firslty thank you to @toolic for his help and guidance :) Whilst trying to create a reproducible example it started working and I think the problem is how I made my populations file.

Previously I used: paste sample_names pops | column -s $'\t' -t > pop_file.txt to output the file printed in the question. However it works if i simply use: paste sample_names pops > pop_file.txt

Also I have put the full path to the .vcf file instead of path from the current directory.

I hope this helps anyone who comes across this issue in the future :)

QPaps
  • 312
  • 1
  • 14