Questions tagged [protein-database]

A file containing protein sequences together with corresponding metadata

Classical protein-databases are text files containing a large number of protein-sequences.

Protein sequences are represented as strings of uppercase letters, each corresponding to a different aminoacid. Each protein sequence is preceeded by a header line containing metadata (protein reference number, name, description...).

The standard fasta format looks like:

>P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS
YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD
AGEGEN
>P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW
YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF
YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE
EQNKEALQDVEDENQ
>.........................................................

A great amount of work in Bioinformatics relates with storing (annotating), searching and analyzing the sequences in these databases.

145 questions
2
votes
1 answer

Why is this source code giving incorrect RMSD value?

Aligning PDB structures with Biopython The following source code is obtained from the above link: import Bio.PDB # Select what residues numbers you wish to align # and put them in a list start_id = 1 end_id = 70 …
user366312
  • 16,949
  • 65
  • 235
  • 452
2
votes
1 answer

How can I compare protein sequences to find closest match

How could I build a tool to help with this scenario : I work in a lab where we use plasmids to express recombinant proteins. We have a database containing all the plasmid identifiers and the sequence of the protein that they code for. When a new…
2
votes
0 answers

How to save NumPy array to PDB (Protein Data Bank) formatted files in python (pycharm)

Hei, I am not quite sure if this might be a trivial question but I am having some troubles with it. I am trying to do the following: I downloaded a folder of about 8000 pdb files on my computer. I converted the folder into an array using:…
Jennan
  • 89
  • 11
2
votes
1 answer

Biopython PDB: calculate distance between an atom and a point

Using a typical pdb file, I am able to calculate the distance between two atoms in a structure using a method similar to that presented in the Biopython documentation. Shown here: from Bio import PDB parser = PDB.PDBParser() pdb1 ='./4twu.pdb'…
MunKee
  • 23
  • 1
  • 4
2
votes
1 answer

How to add chain id in pdb

By using biopython library, I would like to add chains ids in my pdb file. I'm using p = PDBParser() structure=p.get_structure('mypdb',mypdb.pdb) model=structure[0] model.child_list=["A","B"] But I got this error: Traceback (most recent call…
2
votes
1 answer

Extract multiple protein chains from single PDB file

I have a PDB file that contains multiple chains, though no chainid's. I would like to use R to assign chainid's so that I can analyze individual protein chains and find specific sites within each. I am currently using Rpdb to extract the files and…
desc
  • 1,190
  • 1
  • 12
  • 26
2
votes
2 answers

Remove heteroatoms from PDB

The heteroatoms from pdb file has to be removed. Here is the code but it did not work with my test PDB 1C4R. for model in structure: for chain in model: for reisdue in chain: id = residue.id if id[0] != ' ': …
Exchhattu
  • 197
  • 3
  • 15
2
votes
1 answer

Pymol not outputting image

I am trying to draw a protein structure from a pdb file using pymol. However, when I try to run the script below, a pymol window opens but it is just pitch black. Also, bizarrely, the pdb file is outputted to the shell. Here is my…
Charon
  • 2,344
  • 6
  • 25
  • 44
2
votes
2 answers

Commercial databases adept in storing biological sequences

Which commercial databases are adept in storing biological sequences like Protein/DNA sequence? Are there any which were designed specifically to store such sequences? cheers
Arnkrishn
  • 29,828
  • 40
  • 114
  • 128
2
votes
1 answer

Retrieving and parsing protein sequences from GenBank using Entrez in BioPython

As will soon be obvious, I am new to Python and coding in general. I have a list of Gene IDs stored as a text file and I want to use the Entrez functions to search the GenBank database and retrieve the protein sequences corresponding to the IDs.…
JayB
  • 103
  • 2
  • 9
2
votes
1 answer

Parsing a PDB file with multiple structures into an array

I have a PDB file with a few thousand structures, and I would like to save the position coordinates of, say, the alpha carbons of the first ten structures into a numpy array. I can parse a PDB file with a single structure into an array using the…
2
votes
2 answers

Generating a random subset sequences from a fasta file

Hello to Perl Masters in the world. I have another trouble for programming. I am coding a program which selects random sequences from a proteom fasta file with a certain input number. A general fasta file looks like this: >seq_ID_1 descriptions…
Karyo
  • 372
  • 2
  • 4
  • 21
2
votes
1 answer

How to parse PQR files with Biopython

I would like to enable Biopython to read PQR files (modified PDB files with occupancy and B factor replaced by atom charge and radius). The Biopython PDB parser fails to read the Bfactor because it retrieves the value by PDB column indices (which…
qdelettre
  • 1,873
  • 5
  • 25
  • 36
1
vote
0 answers

Bond Type of lingands in PDB files all appear as SINGLE

I am learning rdkit. At the moment I want to extract info from the ligand docked in the protein. The problem I face is that the bonds from the ligand are always returned as SINGLE, no matter there actual types, while the bond types of the protein…
vdlmrc
  • 737
  • 5
  • 17
1
vote
1 answer

Custom string type with a limited alphabet

In my postgres database I have a column which contains sequences of characters. The characters in these sequences are amino acids. There are only 20 amino acids plus some extra characters needed for special purposes. Currently these are stored with…
public static void
  • 1,153
  • 11
  • 20
1
2
3
9 10