-1

I'm in the process of learning how to use perl for genomics applications. I am trying to clean up paired end reads (1 forward, 1 reverse). These are stored in 2 files, but the lines match. What I'm having trouble doing is getting the relevant subroutines to read from the second file (the warnings I get are for uninitialized values).

These files are set up in 4 line blocks(fastq) where the first line is a run ID, 2nd is a sequence, 3rd is a "+", and the fourth holds quality values for the sequence in line 2.

I had no real trouble with this code when it was applied only for one file, but I think I'm misunderstanding how to handle multiple files.

Any guidance is much appreciated!

My warning in this scenario is as such : Use of uninitialized value $thisline in subtraction (-) at ./pairedendtrim.pl line 137, line 4.

#!/usr/bin/perl
#pairedendtrim.pl by AHU
use strict;
use warnings;

die "usage: readtrimmer.pl <file1> <file2> <nthreshold> " unless @ARGV == 3;
my $nthreshold = "$ARGV[2]";

open( my $fastq1, "<", "$ARGV[0]" );
open( my $fastq2, "<", "$ARGV[1]" );

my @forline;
my @revline;
while ( not eof $fastq2 and not eof $fastq1 ) {
    chomp $fastq1;
    chomp $fastq2;
    $forline[0] = <$fastq1>;
    $forline[1] = <$fastq1>;
    $forline[2] = <$fastq1>;
    $forline[3] = <$fastq1>;

    $revline[0] = <$fastq2>;
    $revline[1] = <$fastq2>;
    $revline[2] = <$fastq2>;
    $revline[3] = <$fastq2>;

    my $ncheckfor = removen( $forline[1] );

    my $ncheckrev = removen( $revline[1] );

    my $fortest = 0;
    if ( $ncheckfor =~ /ok/ ) { $fortest = 1 }

    my $revtest = 0;

    if ( $ncheckrev =~ /ok/ ) { $revtest = 1 }

    if ( $fortest == 1 and $revtest == 1 ) { print "READ 1 AND READ 2" }

    if ( $fortest == 1 and $revtest == 0 ) { print "Read 1 only" }

    if ( $fortest == 0 and $revtest == 1 ) { print "READ 2 only" }

}

sub removen {
    my ($thisline) = $_;
    my $ntotal = 0;
    for ( my $i = 0; $i < length($thisline) - 1; $i++ ) {
        my $pos = substr( $thisline, $i, 1 );
        #print "$pos\n";
        if ( $pos =~ /N/ ) { $ntotal++ }
    }
    my $nout;
    if ( $ntotal <= $nthreshold )    #threshold for N
    {
        $nout = "ok";
    } else {
        $nout = "bad";
    }
    return ($nout);
}
Leandro Papasidero
  • 3,728
  • 1
  • 18
  • 33
aupadhyaya
  • 31
  • 3
  • Can you remove as much code as possible while still demonstrating the problem? While doing this you may solve it on your own, if not then it makes it easier for us to spot problems and give relevant advice. – RobEarl Sep 24 '14 at 09:37
  • Hopefully that makes it a bit clearer – aupadhyaya Sep 24 '14 at 09:52
  • Add the warnings to your question, it'll tell us which variables are uninitialized and help narrow down the problem. Also, it's good practice to check the return value of your `open` statements to make sure they succeed, e.g. `open (my $fastq2, "<", "$ARGV[1]") or die "Couldn't open: $!";` – RobEarl Sep 24 '14 at 09:58
  • 1
    `my ($thisline) = $_;`, that should be `@_` – RobEarl Sep 24 '14 at 10:11
  • 1
    Ah! Thank you! Any chance you could explain what the difference entails? I'm still quite unclear on how to use these. – aupadhyaya Sep 24 '14 at 10:34
  • 1
    You can find documentation at [`perlvar`](http://perldoc.perl.org/perlvar.html#General-Variables). It's possible that the previous version used something like `while (<$fastq1>) { ... }` and `$_` happened to contain the correct value when that line was reached. – RobEarl Sep 24 '14 at 11:04

1 Answers1

0

The parameters to a subroutine are in @_, not $_

sub removen {
    my ($thisline) = @_;

I have a few other tips for you as well:

  1. use autodie; anytime that you're doing file processing.
  2. Assign the values in @ARGV to variables first thing. This quickly documents what the hold.
  3. Do not chomp a file handle. This does not do anything. Instead apply chomp to the values returned from reading.
  4. Do not use the strings ok and bad as boolean values.
  5. tr can be used to count the number times a character is in a string.

The following is a cleaned up version of your code:

#!/usr/bin/perl
#pairedendtrim.pl by AHU
use strict;
use warnings;
use autodie;

die "usage: readtrimmer.pl <file1> <file2> <nthreshold> " unless @ARGV == 3;

my ( $file1, $file2, $nthreshold ) = @ARGV;

open my $fh1, '<', $file1;
open my $fh2, '<', $file2;

while ( not eof $fh2 and not eof $fh1 ) {
    chomp( my @forline = map { scalar <$fh1> } ( 1 .. 4 ) );
    chomp( my @revline = map { scalar <$fh2> } ( 1 .. 4 ) );

    my $ncheckfor = removen( $forline[1] );
    my $ncheckrev = removen( $revline[1] );

    print "READ 1 AND READ 2" if $ncheckfor  and $ncheckrev;
    print "Read 1 only"       if $ncheckfor  and !$ncheckrev;
    print "READ 2 only"       if !$ncheckfor and $ncheckrev;
}

sub removen {
    my ($thisline) = @_;
    my $ntotal = $thisline =~ tr/N/N/;
    return $ntotal <= $nthreshold;    #threshold for N
}
Miller
  • 34,962
  • 4
  • 39
  • 60