0

I am a bit of new to Perl and wish to use it in order to extract reads of a specific length from my BAM (alignment) file.

The BAM file contains reads, whose length is from 19 to 29 nt. Here is an example of first 2 reads:

YT:Z:UUA00182:193:HG2NLDMXX:1:1101:29884:1078   0   3R  6234066 42  22M *   0   0   TCACTGGGCTTTGTTTATCTCA  FF:FFFF,FFFFFFFF:FFFFF  AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:22   

YT:Z:UUA00182:193:HG2NLDMXX:1:1101:1777:1094    16  4   1313373 1   24M *   0   0   TCGCATTCTTATTGATTTTCCTTT    FFFFFFF,FFFFFFFFFFFFFFFF    AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:24   

I want to extract only those, which are, let's say, 21 nt in length.

I try to do this with the following code:

my $string = <STDIN>;    
$length = samtools view ./file.bam | head | perl -F'\t'  -lane'length @F[10]';    
if ($length == 21){    
        print($string)    
}        

However, the program does not give any result... Could anyone please suggest the right way of doing this?

pkom
  • 1
  • 1

2 Answers2

1

Your question is a bit confusing. Is the code snippet supposed to be a Perl script or a shell script that calls a Perl one-liner?

Assuming that you meant to write a Perl script into which you pipe the output of samtools view to:

#!/usr/bin/perl
use strict;
use warnings;

while (<STDIN>) {
    my @fields = split("\t", $_);

    # debugging, just to see what field is extracted...
    print "'$fields[10]' ", length($fields[10]), "\n";

    if (length($fields[10]) eq 21) {
        print $_;
    }
}

exit 0;

With your test data in dummy.txt I get:

# this would be "samtools view ./file.bam | head | perl dummy.pl" in your case?
$  cat dummy.txt | perl dummy.pl
'FF:FFFF,FFFFFFFF:FFFFF' 22
'FFFFFFF,FFFFFFFFFFFFFFFF' 24

Your test data doesn't contain a sample with length 21 though, so the if clause is never executed.

Stefan Becker
  • 5,695
  • 9
  • 20
  • 30
  • Judging by @stack0114106 answers you might actually be looking for the 10th field. In Perl array start at index `0`, so you would need to update the code above from `10` to `9`. – Stefan Becker Feb 04 '19 at 08:49
1

Note that the 10th field in your sample input is having either 22 or 24 in length. Also, the syntax that you use is wrong. Here is the Perl one-liner to match the field with length=22.

$ cat pkom.txt
YT:Z:UUA00182:193:HG2NLDMXX:1:1101:29884:1078   0   3R  6234066 42  22M *   0   0   TCACTGGGCTTTGTTTATCTCA  FF:FFFF,FFFFFFFF:FFFFF  AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:22

YT:Z:UUA00182:193:HG2NLDMXX:1:1101:1777:1094    16  4   1313373 1   24M *   0   0   TCGCATTCTTATTGATTTTCCTTT    FFFFFFF,FFFFFFFFFFFFFFFF    AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:24

$ perl -lane ' print if length($F[9])==22 ' pkom.txt
YT:Z:UUA00182:193:HG2NLDMXX:1:1101:29884:1078   0   3R  6234066 42  22M *   0   0   TCACTGGGCTTTGTTTATCTCA  FF:FFFF,FFFFFFFF:FFFFF  AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:22

$
stack0114106
  • 8,534
  • 3
  • 13
  • 38