I have two formats obtained from qiime analyses, one obtained from silva database and other obtained from GreenGenes. The difference among those files, are that silva files have a progressive D_number for each taxon (kingdom= D_0__, phylum= D_1__, clase= D_2__ and so on) and GreenGenes files have a letter for each taxon (kingdom= K__, phylum= p__, clase= c__ and so on)
file_1 (Silva format)
D_0__Archaea;D_1__Euryarchaeota;D_2__Thermoplasmata;D_3__Thermoplasmatales;D_4__ASC21;D_5__uncultured euryarchaeote
file_2(GreenGenes format)
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Streptomycetaceae;g__Streptomyces
so I made tow scripts (one for Silva and one for GreenGenes) in Perl to extract each taxon in a separate file.
I'm trying to incorporate a piece of code in the match section for both formats, I mean:
in the line 16, I want two options, something like:
my @kingd=($taxon_value[0]=~m/D_0__(.*);D_1/g | m/k__(.*);p/g);
Well, I know that it doesn't work
so how can I add two or more option in the same line for match regex ??
this is part of the script (it have 6 option, I just write the Kingdom option !!):
while (<INPUTFILE>){
$line=$_;
chomp($line);
if ($line=~ m/^#/g){
next;
}
elsif ($line=~ m/^[Uu]nassigned/g){
next;
}
elsif ($line){
my @full_line = $_;
foreach (@full_line){
my (@taxon_value)= split (/\t/, $_);
foreach ($taxon_value[0]){
if ($kingdom){
my @kingd=($taxon_value[0]=~m/D_0__(.*);D_1/g); # just for silva
foreach (@kingd){
if ($_=~/^$/){
next;
}
elsif ($_=~ m/^[Uu]nknown/g){
next;
}
elsif ($_=~ m/^[Uu]ncultured$/g){
next;
}
elsif ($_=~ m/^[Uu]nidentified$/g){
next;
}
else {
push @taxon_list, $_;
}
}
}
}
}
}
thanks