0

I am trying to write a program that will accept a pdb file, extract all the information (atom number, atom type, residue name, residue number, x, y, z, b factor), rearrange the residue number, and save the new pdb in a new archive. I can't find a way to use a loop with a string array

This is the code:

print "\nEnter the input file: ";
$inputFile = <STDIN>;
chomp $inputFile;

unless ( open( INPUTFILE, $inputFile ) ) {
    print "Cannot read from '$inputFile'.\nProgram closing.\n";
    <STDIN>;
    exit;
}

chomp( @dataArray = <INPUTFILE> );
close(INPUTFILE);
for ( $line = 0 ; $line <= scalar @dataArray ; $line++ ) {
    if ( $dataArray[$line] =~ m/ATOM\s+(\d+)\s+(\w+)\s+(\w{3})\s+(\w)+\s+(\d+)\s+(\S+\.\S+)\s+(\S+\.\S+)\s+(\S+\.\S+)\s+(.+\S)(.\d\d+\.\d\d.+)/ig ) {
        $m1  = $1;
        $m2  = $2;
        $m3  = $3;
        $m5  = $5;
        $m6  = $6;
        $m7  = $7;
        $m8  = $8;
        $m9  = $9;
        $m10 = $10;
        push( @m3, $m3 );
        push( @m5, $m5 );

        foreach $line ( @m3, @m5 ) {
            if ( $m3[$line] eq $m3[ $line + 1 ] ) {
                $m5[i] = $m5[ i + 1 ];
            }
            elsif ( $m3[$line] ne $m3[ $line + 1 ] ) {
                $m5[ i + 1 ] = $m5[i] + 1;
            }
        }

        $~ = "PDBFORMAT";

        format PDBFORMAT =

ATOM @|||| @||| @|| @|||     @|||||| @|||||| @|||||| @>>>>> @>>>>>

$m1, $m2, $m3,$m51,     $m6,    $m7,    $m8,    $m9,   $m10
.

        open( PDBFORMAT, ">>my2pdb.txt" ) or die "Can't open anything";
        write PDBFORMAT;
    }
}

close PDBFORMAT;

I need to make a script that will make the 6th column continuous according to the residue name (4th column)

This is an example of the input

ATOM 316 CB LEU A 608 -38.110 31.803 16.459 1.00 64.64
ATOM 317 CG LEU A 608 -39.261 32.481 15.719 1.00 71.07
ATOM 318 CD1 LEU A 608 -38.782 33.704 14.929 1.00 73.68
ATOM 319 CD2 LEU A 608 -39.981 31.498 14.829 1.00 69.63
ATOM 320 H LEU A 608 -36.638 31.041 18.563 1.00 99.99
ATOM 321 N ARG A 565 -38.634 34.587 18.911 1.00 22.27
Borodin
  • 126,100
  • 9
  • 70
  • 144
  • First of all, put `use strict; use warnings;` at the begining of your script. Then fix errors then come back if there are some you don't understand. – Toto Jun 18 '15 at 10:53
  • 1
    Can you give an example input and output? Your code has some strange things going on, which I suspect will be the root of your problem - like how you push a single value into `@m3` and `@m5` but then try and reshuffle them after each line of your input data. – Sobrique Jun 18 '15 at 10:54
  • You need to use *meaningful* names for your identifiers. No one can tell what `m3` and `m5` might be – Borodin Jun 18 '15 at 10:59
  • i need to make a check with $m3 which is a residue name, if it is the same a the previous in the column $m3, the column of $m5 should have the same value if not it should add +1, i am not sure how to manipulate the next and previous value of the column. – nastaziales Jun 18 '15 at 11:03
  • Then `$m3` should be `$residue_name`. I still don't know what `$m5` is – Borodin Jun 18 '15 at 11:06
  • this is an example of the input, i need to make a script that will make the 6th column continuous according to the residue name (4th column) 'ATOM 316 CB LEU A 608 -38.110 31.803 16.459 1.00 64.64 ATOM 317 CG LEU A 608 -39.261 32.481 15.719 1.00 71.07 ATOM 318 CD1 LEU A 608 -38.782 33.704 14.929 1.00 73.68 ATOM 319 CD2 LEU A 608 -39.981 31.498 14.829 1.00 69.63 ATOM 320 H LEU A 608 -36.638 31.041 18.563 1.00 99.99 ATOM 321 N ARG A 565 -38.634 34.587 18.911 1.00 22.27' – nastaziales Jun 18 '15 at 11:08
  • Sorry for the nomenclature , $m5 is the residue number – nastaziales Jun 18 '15 at 11:11
  • Edit the data into your post - it Just Doesn't Work in comments. Source data (sample) and output (sample; based on source sample) will help give an answer. – Sobrique Jun 18 '15 at 11:17

1 Answers1

3

I think this will do as you want. Your sample data isn't very comprehensive, so all it does here is change the final residue number to 609

This program expects the path to the input file as a parameter on the command line, so something like

perl process_pdb.pl infile.pdb
use strict;
use warnings;

my ($last_name, $last_num);

while ( <> ) {

  next unless /^ATOM/;

  my @fields = split;
  my $name = $fields[3];

  if ( $last_name ) {
    $fields[5] = $name eq $last_name ? $last_num : $last_num + 1;
  }

  print "@fields\n";

  ($last_name, $last_num) = @fields[3,5];
}

output

ATOM 316 CB LEU A 608 -38.110 31.803 16.459 1.00 64.64
ATOM 317 CG LEU A 608 -39.261 32.481 15.719 1.00 71.07
ATOM 318 CD1 LEU A 608 -38.782 33.704 14.929 1.00 73.68
ATOM 319 CD2 LEU A 608 -39.981 31.498 14.829 1.00 69.63
ATOM 320 H LEU A 608 -36.638 31.041 18.563 1.00 99.99
ATOM 321 N ARG A 609 -38.634 34.587 18.911 1.00 22.27
Borodin
  • 126,100
  • 9
  • 70
  • 144