Questions tagged [protein-database]

A file containing protein sequences together with corresponding metadata

Classical protein-databases are text files containing a large number of protein-sequences.

Protein sequences are represented as strings of uppercase letters, each corresponding to a different aminoacid. Each protein sequence is preceeded by a header line containing metadata (protein reference number, name, description...).

The standard fasta format looks like:

>P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS
YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD
AGEGEN
>P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW
YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF
YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE
EQNKEALQDVEDENQ
>.........................................................

A great amount of work in Bioinformatics relates with storing (annotating), searching and analyzing the sequences in these databases.

145 questions
0
votes
2 answers

Retrieving DNA sequences from a database of protein sequences?

I have 1000's of protein sequences in FASTA and their accession numbers. I want to go back into the whole genome shotgun database and retrieve all DNA sequences that encode for a protein identical to one in my list of initial sequences. I've tried…
Andrew
  • 33
  • 3
0
votes
2 answers

Want to pull a journal title from an RCSB Page using python & BeautifulSoup

I am trying to get specific information about the original citing paper in the Protein Data Bank given only the 4 letter PDBID of the protein. To do this I am using the python libraries requests and BeautifulSoup. To try and build the code, I went…
0
votes
1 answer

Matlab : How to highlight GLYCINE residues in my Ramachandran plot?

I am trying matlab to plot ramachandran plot, without using built in command. I have succeeded too. Now I wanted to spot the GLYCINEs alone in the scatter array. Any ideas how to do this? (link to 1UBQ.pdb file :…
dexterdev
  • 537
  • 4
  • 22
0
votes
2 answers

Matlab : querying all the nitrogen coordinates in pdb file?

I was trying to extract Nitrogen coordinates from ubiquitin protein. I have the 1UBQ.pdb file from http://rcsb.org/pdb/home/home.do website. I have done the following. pdb1…
dexterdev
  • 537
  • 4
  • 22
0
votes
1 answer

Error in downloading pdb from protein data bank using biopython

Some pdbs cannot be download from PDB using biopython, though they exist in PDB. It generates the error. This code is used to download pdb (2j8e) It could not download however it works for other pdbs. Python 2.7.4 (default, May 14 2013,…
Exchhattu
  • 197
  • 3
  • 15
0
votes
1 answer

making arrays from data in files and subtracting them

I am trying to find the distance between objects in 3D from a Protein Data Base file (PDB). A PDB file looks like this. Example: ATOM 1 N GLU 1 -19.992 -2.816 36.359 0.00 0.00 PROT ATOM 2 HT1 GLU 1 -19.781 …
0
votes
3 answers

Can any one help me understand and solve this error?

I would like to plot a distribution of alpha-cabon to nitrogen bond distances of ubiquitin protein. So I downloaded the 1UBQ.pdb from RCSB website. Now using biopython, I am trying to find the distances between all alpha-cabon(CA) to nitrogen(N)…
dexterdev
  • 537
  • 4
  • 22
0
votes
3 answers

Performing a function on each combination of variables in two arrays

I am trying to take one set of data and subtract each value in that data by another set of data. For example: Data set one (1, 2, 3) Data set two (1, 2, 3, 4, 5) So I should get something like (1 - (1 .. 5)) then (2 - (1..5)) and so on. I currently…
0
votes
2 answers

extracting highly similar proteins from a protein databases

How can I get from the PDB database highly highly similar structures? Lets say 98% or higher sequence similar structures?
Steve Grafton
  • 1,821
  • 4
  • 17
  • 18
0
votes
0 answers

Can d3 draw pfam domain

Just a quick question. Can I use d3 to draw protein domains like the following? Image of a protein domain My plan is to attach these little protein domains to a tree, which would look awesome. Thanks a lot in advance!
fabsta
  • 147
  • 1
  • 4
  • 13
0
votes
1 answer

How to edit information of Uniprot downloads (either txt or XML)

I downloaded Uniprot files of a group of proteins (n>1000, so manually checking these proteins is no option). The complete data files come as either a flat text file or a XML file. There is a lot of information present in these files (for an…
0
votes
2 answers

Pull Alignment Character Position

I use pairwise align to get the following: > alignment <-pairwiseAlignment(pattern = canonical.protein, subject=protein.extracted) > alignment Global PairwiseAlignedFixedSubject (1 of 1) pattern: [448] …
user1357015
  • 11,168
  • 22
  • 66
  • 111
-1
votes
1 answer

Using Kaggle code/model to predict classifications for unseen dataset

I have obtained the following code along with a dataset from a Kaggle notebook: https://www.kaggle.com/code/danofer/predicting-protein-classification/notebook import pandas as pd import numpy as np from matplotlib import pyplot as plt import seaborn…
Rashid
  • 1,244
  • 3
  • 13
  • 29
-1
votes
1 answer

find motif from in between fasta file from python

Can someone help me with this python code? When I run it, nothing happens. No errors or anything weird to me. It reads in and opens the file just fine. I have a set of protein sequence in Fasta format and I have to find motifs of my sequence like…
-1
votes
1 answer

Reading lines from a pdb file

I want to read only those lines that contains "ATOM" as the first word and write in a file using a Fortran code. I have tried to write a code but was unable to read that specific lines containing word "ATOM" only. I hope someone can help me in this…
anupama
  • 1
  • 2
1 2 3
9
10