Questions tagged [protein-database]

A file containing protein sequences together with corresponding metadata

Classical protein-databases are text files containing a large number of protein-sequences.

Protein sequences are represented as strings of uppercase letters, each corresponding to a different aminoacid. Each protein sequence is preceeded by a header line containing metadata (protein reference number, name, description...).

The standard fasta format looks like:

>P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS
YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD
AGEGEN
>P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW
YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF
YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE
EQNKEALQDVEDENQ
>.........................................................

A great amount of work in Bioinformatics relates with storing (annotating), searching and analyzing the sequences in these databases.

145 questions
0
votes
1 answer

How to delete lines that match elements from another file

I am in the process of learning Perl and I am trying to figure out how to do this task. I have a folder with a bunch of text files and I have a file ions_solvents_cofactors that contains bunch of three letters list. I wrote a script that opens and…
milan
  • 51
  • 5
0
votes
0 answers

Importing PyMol modules into python program in Eclipse

Hi I'm a complete beginner with Python and PyMol, but I've done a lot of work in Java. I am trying to import pymol modules(i think they're called modules or libraries) into my python program in Eclipse, but I keep getting the error "No module named…
Po Don
  • 1
0
votes
0 answers

How to BLAST against TrEMBL only

I am doing command-line BLAST with BioPython. I need to search against TrEMBL, but I have not been able to find how to do that. I know I can select db="nr", and that works, but I want to search only against TrEMBL, which constitues a subset of nr…
0
votes
2 answers

what does it mean? about Python regular expression

Last time my question was like, (How can I get contents between square brackets by using regular expression?) #start gene g1 dog1 dog2 dog3 #protein sequence = [DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD] #end gene g1 ### #start gene…
tehoo
  • 81
  • 2
  • 9
0
votes
2 answers

Bio.PDB mmcif2dict module not callable

I fetched a cristal structure of a protein using the function retrieve_pdb_file from Bio.PDB. The default format has changed from PDB to PDBx/mmCif. I want to extract the protein sequence from the header in the cif file. There is supposed to be a…
0
votes
3 answers

Comparing two words from different lines in a file using python

I am working with a file from the protein data bank which looks something like this. SITE 2 AC1 15 ASN A 306 LEU A 309 ILE A 310 PHE A 313 SITE 3 AC1 15 ARG A 316 LEU A 326 ALA A 327 ILE A 345 …
GokRix
  • 65
  • 1
  • 11
0
votes
1 answer

Biopython Array Addition Error (Open for all)

Okay. Let me explain the things first. I have used a specific module named Biopython in this code. I am explaining the necessary details to solve the problem if you are not accustomed with the module. The code is: #!/usr/bin/python from…
0
votes
1 answer

How to get network into protein shape in cytoscape?

I have a protein network loaded into cytoscape. I want to get the network nodes into a shape that corresponds that of the protein shape/structure. This is so that I can superimpose the network image onto the protein structure image. I tried…
Adeetya
  • 101
  • 9
0
votes
2 answers

Extract multiple protein sequences from a Protein Data Bank along with Secondary Structure

I want to extract protein sequences and their corresponding secondary structure from any Protein Data bank, say RCSB. I just need short sequences and their secondary structure. Something like, ATRWGUVT Helix It is fine even if the sequences are…
Rama
  • 1,019
  • 1
  • 15
  • 34
0
votes
0 answers

How to read specific atomic coordinates from a PDB file using C

#include #include #include int main() { FILE *pdb; pdb=fopen("5an3.pdb", "r"); if(feof(pdb)) { fprintf(stderr,"File reading error!!! Probably your PDB file doesn't contain anything or poorly…
0
votes
1 answer

Dictionary key from pdb file

I'm trying to go through a .pdb file, calculate distance between alpha carbon atoms from different residues on chains A and B of a protein complex, then store the distance in a dictionary, together with the chain identifier and residue number. For…
Alexandra
  • 3
  • 2
0
votes
1 answer

length of the longest possible string contains no repeated 3-mers

I'm trying to find the length of the longest possible string of consecutive digits that contains no repeated 3-mers. This is a bioinformatics question, and I'm sorting this for protein sequence. basically, something like 0102340109 does not work…
JY078
  • 393
  • 9
  • 21
0
votes
1 answer

Extracting Fasta Moonlight Protein Sequences with Python

I want to extract the FASTA files that have the aminoacid sequence from the Moonlighting Protein Database ( www.moonlightingproteins.org/results.php?search_text= ) via Python, since it's an iterative process, which I'd rather learn how to program…
0
votes
1 answer

Looking at the results of clustering algorithms on Protein Interaction Networks

I am working on a project involving the clustering of Protein Interaction Networks, having made several clustering algorithms on the graphs of interacting proteins, I am somewhat confused on how I would now go about seeing whether the clusters…
paulmorio
  • 29
  • 1
  • 4
0
votes
1 answer

BioPython: Residues size differ from position

I'm currently working with a data set of PDBs and I'm interested in the sizes of the residues (number of atom per residue). I realized the number of atoms -len(residue.child_list) - differed from residues in different proteins even though being the…
rodgdor
  • 2,530
  • 1
  • 19
  • 26