0

I have two formats obtained from qiime analyses, one obtained from silva database and other obtained from GreenGenes. The difference among those files, are that silva files have a progressive D_number for each taxon (kingdom= D_0__, phylum= D_1__, clase= D_2__ and so on) and GreenGenes files have a letter for each taxon (kingdom= K__, phylum= p__, clase= c__ and so on)

file_1 (Silva format)
D_0__Archaea;D_1__Euryarchaeota;D_2__Thermoplasmata;D_3__Thermoplasmatales;D_4__ASC21;D_5__uncultured euryarchaeote



file_2(GreenGenes format)
k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Streptomycetaceae;g__Streptomyces

so I made tow scripts (one for Silva and one for GreenGenes) in Perl to extract each taxon in a separate file.

I'm trying to incorporate a piece of code in the match section for both formats, I mean:

in the line 16, I want two options, something like:

my @kingd=($taxon_value[0]=~m/D_0__(.*);D_1/g | m/k__(.*);p/g);

Well, I know that it doesn't work

so how can I add two or more option in the same line for match regex ??

this is part of the script (it have 6 option, I just write the Kingdom option !!):

while (<INPUTFILE>){
    $line=$_;
    chomp($line);
    if ($line=~ m/^#/g){
        next;
    }
    elsif ($line=~ m/^[Uu]nassigned/g){
        next;
    }
    elsif ($line){
        my @full_line = $_;
        foreach (@full_line){
            my (@taxon_value)= split (/\t/, $_);
            foreach ($taxon_value[0]){
                if ($kingdom){
                    my @kingd=($taxon_value[0]=~m/D_0__(.*);D_1/g); # just for silva
                    foreach (@kingd){
                        if ($_=~/^$/){
                            next;
                        }
                        elsif ($_=~ m/^[Uu]nknown/g){
                            next;
                        }
                        elsif ($_=~ m/^[Uu]ncultured$/g){
                            next;
                        }
                        elsif ($_=~ m/^[Uu]nidentified$/g){
                            next;
                        }
                        else {
                            push @taxon_list, $_;
                        }
                    }
                }
           }
      }
 }

thanks

Dada
  • 6,313
  • 7
  • 24
  • 43
abraham
  • 661
  • 8
  • 14

1 Answers1

2

You need to do the or inside of your pattern. You do that with a pipe |, which you already had. But it needs to go into the pattern. No need to have two match operators.

my @kingd = $taxon_value[0] =~ m/D_0__(.*);D_1|k__(.*);p/g

It will now match either the one, or the other. See perlre and perlretut for more information. You should also read the information provided in the regex tag wiki here on SO as it contains links to many useful tools.

What you were doing in your code that didn't work is using Perl's | operator, which is a bitwise or.

simbabque
  • 53,749
  • 8
  • 73
  • 136
  • Thanks so much al of you, the true is that I had no reason to make each elseif for each one, the only reason is that I´m learning perl by my self, and some times the simple thing in programing is no very clear for me.... well I made the changes that all of you recommend – abraham Oct 28 '16 at 16:35
  • @abraham take a look at the tutorials that are mentioned in the tag wiki for Perl here on Stack Overflow. – simbabque Oct 28 '16 at 16:40
  • thanks so much, just a another question, and sorry to ask so much, the changes that all of you recommend works but now I have the problem that apear an message: Use of uninitialized value $_ in pattern match (m//) at Perl2.pl line..... when I use my @kingd = $taxon_value[0] =~ m/D_0__(.*);D_1|k__(.*);p/g, but if I use it, just like m/D_0__(.*);D_1/g I got no problem, by now I just don't use the warnings module, sorry to ask to many. Thanks a lot – abraham Oct 28 '16 at 16:48
  • Always use warnings, and fix them. I don't have a computer right now. I'll check later. – simbabque Oct 28 '16 at 16:50