4

I wonder how the multiple pdbs can be written in single pdb file using biopython libraries. For reading multiple pdbs such as NMR structure, there is content in documentation but for writing, I do not find. Does anybody have an idea on it?

cel
  • 30,017
  • 18
  • 97
  • 117
Exchhattu
  • 197
  • 3
  • 15

3 Answers3

2

Yes, you can. It's documented here. Image you have a list of structure objects, let's name it structures. You might want to try:

from bio import PDB

pdb_io = PDB.PDBIO()

target_file = 'all_struc.pdb'
with pdb_file as open_file:
    for struct in structures:
        pdb_io.set_structure(struct[0])
        pdb_io.save(open_file)

That is the simplest solution for this problem. Some important things:

  • Different protein crystal structures have different coordinate systems, then you probably need to superimpose them. Or apply some transformation function to compare.
  • In pdb_io.set_structure you can select a entity or a chain or even a bunch of atoms.
  • In pdb_io.save has an secondary argument which is a Select class instance. It will help you remove waters, heteroatoms, unwanted chains...

Be aware that NMR structures contain multiple entities. You might want to select one. Hope this can help you.

tbrittoborges
  • 965
  • 1
  • 6
  • 16
2

Mithrado's solution may not actually achieve what you want. With his code, you will indeed write all the structures into a single file. However, it does so in such a way that might not be readable by other software. It adds an "END" line after each structure. Many pieces of software will stop reading the file at that point, as that is how the PDB file format is specified.

A better solution, but still not perfect, is to remove a chain from one Structure and add it to a second Structure as a different chain. You can do this by:

# Get a list of the chains in a structure
chains = list(structure2.get_chains())
# Rename the chain (in my case, I rename from 'A' to 'B')
chains[0].id = 'B'
# Detach this chain from structure2
chains[0].detach_parent()
# Add it onto structure1
structure1[0].add(chains[0])

Note that you have to be careful that the name of the chain you're adding doesn't yet exist in structure1.

In my opinion, the Biopython library is poorly structured or non-intuitive in many respects, and this is just one example. Use something else if you can.

Nate
  • 1,253
  • 13
  • 21
0

Inspired by Nate's solution, but adding multiple models to one structure, rather than multiple chains to one model:

ms = PDB.Structure.Structure("master")

i=0
for structure in structures:
    for model in list(structure):
        new_model=model.copy()
        new_model.id=i
        new_model.serial_num=i+1
        i=i+1
        ms.add(new_model)

pdb_io = PDB.PDBIO()
pdb_io.set_structure(ms)
pdb_io.save("all.pdb")