I want to sort data from pdb file in perl or python

Question

I want to print sequence of Ribose Puckering.

Script in perl:

   open (filehandler, "List_NAD_ID.txt") or die $!; #Input file
   my @file1=<filehandler>;

   my $OutputDir = 'C:\Users\result'; #output directory path

   foreach my $line (@file1)
   {
       chomp $line; 
       open (fh,"$line") or die $!;
       open (out, ">$OutputDir/$line.pdb") or die $!;           
       print out "\n" , "$line  ";  
       print out "\n";

       while($file = <fh>)
       {


            if($file =~/^HETATM.{7}(?:C4B|O4B|C1B|C2B|O4B|C1B|C2B|C3B|C1B|C2B|C3B|C4B|C2B|C3B|C4B|O4B|C3B|C4B|O4B|C1B)/)  
            {

                print out "$file";
            }
       }
       print "Completed", "\n";
  }

I have pdb input file:

 HETATM 3934  C4B NAD A 255      10.495 -11.444   1.016  1.00 50.46           C  
 HETATM 3935  O4B NAD A 255      10.768 -11.615   2.448  1.00 48.17           O  
 HETATM 3936  C3B NAD A 255      10.445 -12.867   0.431  1.00 49.69           C  
 HETATM 3938  C2B NAD A 255      10.431 -13.759   1.675  1.00 48.46           C  
 HETATM 3940  C1B NAD A 255      11.323 -12.898   2.593  1.00 46.97           C  
 HETATM 3978  C4B NAD B 256      14.596   1.733  33.219  1.00 50.48           C  
 HETATM 3979  O4B NAD B 256      14.370   0.578  32.357  1.00 48.22           O  
 HETATM 3980  C3B NAD B 256      14.940   1.177  34.603  1.00 49.64           C  
 HETATM 3982  C2B NAD B 256      14.987  -0.347  34.401  1.00 48.48           C  
 HETATM 3984  C1B NAD B 256      14.066  -0.517  33.189  1.00 46.98           C

Expected Result:

I want to copy following atom and then paste as per following sequence. All should be chain wise. (Chain "A, B, C,..........")

 HETATM 3934  **C4B** NAD **A** 255      10.495 -11.444   1.016  1.00 50.46           C  
 HETATM 3935  **O4B** NAD **A** 255      10.768 -11.615   2.448  1.00 48.17           O
 HETATM 3938  **C2B** NAD **A** 255      10.431 -13.759   1.675  1.00 48.46           C  
 HETATM 3940  **C1B** NAD **A** 255      11.323 -12.898   2.593  1.00 46.97           C    
 HETATM 3935  **O4B** NAD **A** 255      10.768 -11.615   2.448  1.00 48.17           O  
 HETATM 3940  **C1B** NAD **A** 255      11.323 -12.898   2.593  1.00 46.97           C  
 HETATM 3938  **C2B** NAD **A** 255      10.431 -13.759   1.675  1.00 48.46           C  
 HETATM 3936  **C3B** NAD **A** 255      10.445 -12.867   0.431  1.00 49.69           C 
 .
 .
 .

I have five level of paste sequence, v0,v1,v2,v3,v4.

Sequence is:

C4B-O4B-C1B-C2B
O4B-C1B-C2B-C3B
C1B-C2B-C3B-C4B
C2B-C3B-C4B-O4B
C3B-C4B-O4B-C1B

This all sequence, I want to print data as per above sequence. I have also edited expected result.

I want to sort data as per above sequence, chain wise. I am not getting expected result. I have tried in perl. I am new in perl and python... so please try to solve my problem

Its Like matrix problem:

for example we have five values: 1,2,3,4,5

Row 1 - 1  2  3  4  
Row 2 - 2  3  4  5 
Row 3 - 3  4  5  1
Row 4 - 4  5  1  2

I want to print data like that for each chain. Chain A to Z.

Could you describe how you would like to sort the data? The rules aren't clear from the example. — choroba, Oct 06 '16 at 07:05
@ Choroba, I want to copy all required ATOM and then print only (C4B, O4B, C1B, C2B, C3B) as per sequence. — krish, Oct 06 '16 at 08:28
I don't get it. Why is line 3 coming before line 4, when C1B precedes C2b in the sequence line 1? Or do you sort by column 2? But then, why doesn't line 7 precede line 6? — choroba, Oct 06 '16 at 08:35
Actually, I am following Ribose Puckering (Chemistry rule), for that I want to use. I have 2000 file to sort and make sequence as per Ribose Puckering rule. I have edited required sequence. — krish, Oct 06 '16 at 08:41
In your code, you're not sorting anything. You just open a lot of files and print a bunch of stuff. — simbabque, Oct 06 '16 at 08:52
Yeah, I don't know how to sort this type of sequence, I am new in perl — krish, Oct 06 '16 at 08:54
This is a programming site, not chemistry. Can you explain the rule? — choroba, Oct 06 '16 at 09:10
Dear @choroba, Simply I want to copy data from input file and then paste as per this sequence "C4B-O4B-C1B-C2B" (this is atom name). I have also edited expected output, I want to sort data as per sequence. — krish, Oct 06 '16 at 09:12
Then again, why is line 3 coming before line 4, when C1B precedes C2B in the sequence line 1? — choroba, Oct 06 '16 at 09:14
Dear, Its a sequence, that why I want to follow same rule. data will be same but position will be change for all coming atom. I want to copy paste as per given sequence. — krish, Oct 06 '16 at 09:18
We don't understand the rule unless you explain it. How does it relate to sorting? — simbabque, Oct 06 '16 at 10:33
@simbabque and choroba, I have explained above, its like matrix problem — krish, Oct 08 '16 at 07:30

score 0 · Answer 1 · answered Oct 06 '16 at 11:31

0

If you want to use Biopython, you have to create all the Chains and insert the Atoms in it. But the atoms must be hold in a Residue for this to work out:

from Bio.PDB import PDBParser, PDBIO, Chain, Residue

# This is your source structure
pdb = PDBParser().get_structure("UGLY", "ugly.pdb")

# Now you cycle all your chains
for chain in pdb.get_chains():
    # Load all the atoms and residues in each Chain
    atoms = list(chain.get_atoms())
    residues = list(chain.get_residues())

    # Start a new structure to save the output
    io = PDBIO()
    this_chain = Chain.Chain("A")
    this_residue = Residue.Residue(residues[0].id,
                                   residues[0].resname,
                                   residues[0].segid)

    # Now get the atoms in your source structure that matches your sort keys
    # You should refactor this out to a function that accepts a sort key
    #  and returns a list of atoms or a residue with the atoms added.
    for atom_name in "O4B-C1B-C2B-C3B".split("-"):
        for atom in atoms:
            if atom.get_name() == atom_name:
                this_residue.add(atom)

    # Add the residue to a structure and save it
    this_chain.add(this_residue)
    io.set_structure(this_chain)
    # And now write your output file. Remember to change the name!
    io.save("temp.pdb")

answered Oct 06 '16 at 11:31

xbello

7,223
3
28
41

Thanks @xbello, can you please let me know how to add directory for input and output. I have 2000 pdb files and chain ID may be A to Z. – krish Oct 06 '16 at 16:18
You have to use either `os.path` or `glob` modules, and use the results to feed the parser/writer functions. It's a whole different question. – xbello Oct 06 '16 at 16:37
I have tried but getting error can you please edit above program – krish Oct 06 '16 at 16:50
Please let me know how to add chain A to Z and atom name: C4B-O4B-C1B-C2B-O4B-C1B-C2B-C3B-C1B-C2B-C3B-C4B-C2B-C3B-C4B-O4B-C3B-C4B-O4B-C1B. – krish Oct 07 '16 at 08:09
Have you ever used python? You have to tweak the `Chain.Chain("A")` and `for atom_name in "O4B-C1B-C2B-C3B"` lines to suit your needs, that are very unclear. Did you notice that your expected results doesn't include other chain than "A"? – xbello Oct 07 '16 at 08:34
Yeah I used python, but not much I am new in python, I tried this Chain.Chain("A","B","C","D") and for atom_name in "C4B-O4B-C1B-C2B-O4B-C1B-C2B-C3B". but getting error. sorry for trouble but please try to help me dear. – krish Oct 07 '16 at 09:13
Do you have any solution? please let me know – krish Oct 12 '16 at 08:01
No, appart from what is already posted. You question is too broad and you didn't supply any code on your own that we can fix. It's almost impossible to code something that suits your data without viewing code nor data. – xbello Oct 12 '16 at 12:04
Okay dear no problem, Thanks – krish Oct 12 '16 at 14:23

I want to sort data from pdb file in perl or python

1 Answers1