Questions tagged [protein-database]

A file containing protein sequences together with corresponding metadata

Classical protein-databases are text files containing a large number of protein-sequences.

Protein sequences are represented as strings of uppercase letters, each corresponding to a different aminoacid. Each protein sequence is preceeded by a header line containing metadata (protein reference number, name, description...).

The standard fasta format looks like:

>P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS
YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD
AGEGEN
>P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW
YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF
YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE
EQNKEALQDVEDENQ
>.........................................................

A great amount of work in Bioinformatics relates with storing (annotating), searching and analyzing the sequences in these databases.

145 questions
1
vote
1 answer

Database of interacting protein

I am doing a Protein protein interaction network clustering project. For testing the result I downloaded the data set from DIP database. It is having DIP-id for each protein and I want to compare the cluster(DIP-id as protein name) with the golden…
deepthi vr
  • 11
  • 1
1
vote
2 answers

How to extract all chains from a PDB file?

I follow this page How to extract chains from a PDB file? but I am not able to find complete solution of what I want. Here is my question: without giving particular chain id, I want to extract all the chain ids in the pdb and write these chain ids…
Exchhattu
  • 197
  • 3
  • 15
1
vote
1 answer

Ncbi protein database, how to get protein sequences from a specific bioproject (python script)

I am trying to retrieve codding protein sequences from NCBI database from specific bioprojects. This can be achieved somehow using a web browser. For instance you can find the specific bioproject you are interested in and "click" on the associated…
1
vote
2 answers

How to find a centre in the structure. python code

I'm beginner to python coding. I'm working over structural coordinates. I have pdb structure of 1000 atoms which have xyz coordinate information. My structure can have any shape. I am struggling to find the center point inside the…
awanit
  • 263
  • 1
  • 2
  • 11
1
vote
1 answer

Issue with Bio.Entrez and protein in Biopython 1.60

I'm having a issue using Bio.Entrez to search a protein. I'm doing this: >>> handle=Entrez.esearch(db="protein", term="insulin AND homo") >>> record=Entrez.read(handle) Traceback (most recent call last): File "", line 1, in File…
1
vote
2 answers

Protein Sequence Alignment from Protein Databank to Cosmic or Uniprot

I would like to match up PDB files from the Protein Databank to canonical AA sequences for the protein as displayed in Cosmic or Uniprot. Specifically, what I need to do is pull from the pdb file, the carbon alpha atoms in the backbone and their xyz…
user1357015
  • 11,168
  • 22
  • 66
  • 111
0
votes
1 answer

pdb protein bank format - ligand removal

I would like to remove various ligands from PDB records. Is it just sufficient to remove HET, HETNAM,HETATM...., ie. those, where is compound identified with its 3letter code, or is it necessary to clean some other fields? Is there any python|perl…
0
votes
0 answers

Download multiple sequences in one fasta file from UniProt using Python 3, and add a progress bar using Requests library

I made a python script for downloading protein sequences from Uniprot in fasta format. The script will read the accession numbers from a text file containing the accession numbers (one on each line) and then try to download the respective sequence…
0
votes
0 answers

I would like to retrieve proteins sequences from many protein IDs

library(biomaRt) biomart <- "plants_mart" dataset <- "athaliana_eg_gene" mart <- useMart(biomart = biomart, dataset = dataset) protein_ids <- c("AT5G67220.1", "AT1G07110.1", "AT5G39570.1") protein_sequences <- getSequence(id = protein_ids, type…
0
votes
0 answers

Model validation RMSE score is fine but testing is really bad

I'm training a Linear Regression Neural Network on embeddings of protein data. The problem I'm facing is that for the training and validations loss scores I'm getting decent results but once I try the testing dataset (which I don't have access to) I…
Moe_blg
  • 71
  • 4
0
votes
0 answers

R: unable to use select() on variables in dataframe from PDB

I have been using R but new to PDB files for protein structure. I hope to modify values in PDB files as if they are common dataframe. Here's what I did: library(bio3d) library(tidyverse) pdb <- read.pdb("7nhx") atom <- as.data.frame(pdb$atom) but…
Emily
  • 1
0
votes
0 answers

Error when adding centrality_type to xina_plot_all function in XINA package

I am trying to output some visuals using the xina_plot_all() function in the XINA package, but when I add a centrality_type = 'Eigenvector' to the xina_plot_all() I get The following error: Error in if (is.na(vertex.color)) { : the condition has…
0
votes
1 answer

Open Babel Warning in PerceiveBondOrders

I try to gnina docking tool for protein-ligand docking But I can get output file just included ligand. (there is no protein in pdb) Is there problem at openbabel in gnina? Or at input file format? Command is gnina -r 6vl4/6vl4.pdb -l…
0
votes
1 answer

Suggestions on Analyzing Protein Sequences Similarity

I want to write code to analyze short protein sequences and determine their similarity. I have no reference sequence but rather I want to write some sort of for loop to compare them all to each other to see how many duplicate sequences I have, as…
0
votes
0 answers

Getting Genebank info automatically in a python script

Is there a method in Biopython where I can get the entire genebank file just by inputting the accession number of the protein? From what I'm seeing, I would have to have the gb file locally, or scrape it from the genebank website. Is there a…
asal
  • 1
  • 1