Questions tagged [protein-database]

A file containing protein sequences together with corresponding metadata

Classical protein-databases are text files containing a large number of protein-sequences.

Protein sequences are represented as strings of uppercase letters, each corresponding to a different aminoacid. Each protein sequence is preceeded by a header line containing metadata (protein reference number, name, description...).

The standard fasta format looks like:

>P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS
YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD
AGEGEN
>P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW
YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF
YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE
EQNKEALQDVEDENQ
>.........................................................

A great amount of work in Bioinformatics relates with storing (annotating), searching and analyzing the sequences in these databases.

145 questions

vote

1 answer

Database of interacting protein

I am doing a Protein protein interaction network clustering project. For testing the result I downloaded the data set from DIP database. It is having DIP-id for each protein and I want to compare the cluster(DIP-id as protein name) with the golden…

protein-database

asked Feb 01 '15 at 07:20

deepthi vr

vote

2 answers

How to extract all chains from a PDB file?

I follow this page How to extract chains from a PDB file? but I am not able to find complete solution of what I want. Here is my question: without giving particular chain id, I want to extract all the chain ids in the pdb and write these chain ids…

biopython chain protein-database

asked Sep 05 '14 at 00:57

Exchhattu

vote

1 answer

Ncbi protein database, how to get protein sequences from a specific bioproject (python script)

I am trying to retrieve codding protein sequences from NCBI database from specific bioprojects. This can be achieved somehow using a web browser. For instance you can find the specific bioproject you are interested in and "click" on the associated…

python ncbi protein-database

asked Nov 14 '13 at 13:14

user2991786

vote

2 answers

How to find a centre in the structure. python code

I'm beginner to python coding. I'm working over structural coordinates. I have pdb structure of 1000 atoms which have xyz coordinate information. My structure can have any shape. I am struggling to find the center point inside the…

python math numpy shapes protein-database

asked Sep 10 '13 at 12:10

awanit

vote

1 answer

Issue with Bio.Entrez and protein in Biopython 1.60

I'm having a issue using Bio.Entrez to search a protein. I'm doing this: >>> handle=Entrez.esearch(db="protein", term="insulin AND homo") >>> record=Entrez.read(handle) Traceback (most recent call last): File "", line 1, in File…

python bioinformatics biopython ncbi protein-database

asked Jan 26 '13 at 02:50

alejo0317

vote

2 answers

Protein Sequence Alignment from Protein Databank to Cosmic or Uniprot

I would like to match up PDB files from the Protein Databank to canonical AA sequences for the protein as displayed in Cosmic or Uniprot. Specifically, what I need to do is pull from the pdb file, the carbon alpha atoms in the backbone and their xyz…

r alignment sequence bioconductor protein-database

asked May 28 '12 at 15:09

user1357015

11,168
22
66
111

votes

1 answer

pdb protein bank format - ligand removal

I would like to remove various ligands from PDB records. Is it just sufficient to remove HET, HETNAM,HETATM...., ie. those, where is compound identified with its 3letter code, or is it necessary to clean some other fields? Is there any python|perl…

protein-database pdb

asked Oct 07 '11 at 18:44

user979678

votes

0 answers

Download multiple sequences in one fasta file from UniProt using Python 3, and add a progress bar using Requests library

I made a python script for downloading protein sequences from Uniprot in fasta format. The script will read the accession numbers from a text file containing the accession numbers (one on each line) and then try to download the respective sequence…

python-3.x python-requests progress-bar fasta protein-database

asked Jul 19 '23 at 07:39

Irfan

votes

0 answers

I would like to retrieve proteins sequences from many protein IDs

library(biomaRt) biomart <- "plants_mart" dataset <- "athaliana_eg_gene" mart <- useMart(biomart = biomart, dataset = dataset) protein_ids <- c("AT5G67220.1", "AT1G07110.1", "AT5G39570.1") protein_sequences <- getSequence(id = protein_ids, type…

r dataset fasta protein-database biomart

asked Jul 18 '23 at 16:39

code_newbie

votes

0 answers

Model validation RMSE score is fine but testing is really bad

I'm training a Linear Regression Neural Network on embeddings of protein data. The problem I'm facing is that for the training and validations loss scores I'm getting decent results but once I try the testing dataset (which I don't have access to) I…

python pytorch linear-regression protein-database

asked May 14 '23 at 00:15

Moe_blg

votes

0 answers

R: unable to use select() on variables in dataframe from PDB

I have been using R but new to PDB files for protein structure. I hope to modify values in PDB files as if they are common dataframe. Here's what I did: library(bio3d) library(tidyverse) pdb <- read.pdb("7nhx") atom <- as.data.frame(pdb$atom) but…

r protein-database

asked May 04 '23 at 15:23

Emily

votes

0 answers

Error when adding centrality_type to xina_plot_all function in XINA package

I am trying to output some visuals using the xina_plot_all() function in the XINA package, but when I add a centrality_type = 'Eigenvector' to the xina_plot_all() I get The following error: Error in if (is.na(vertex.color)) { : the condition has…

r debugging error-handling protein-database

asked May 02 '23 at 22:25

Steven DeLaGarza

votes

1 answer

Open Babel Warning in PerceiveBondOrders

I try to gnina docking tool for protein-ligand docking But I can get output file just included ligand. (there is no protein in pdb) Is there problem at openbabel in gnina? Or at input file format? Command is gnina -r 6vl4/6vl4.pdb -l…

python protein-database sdf openbabel

asked Mar 30 '23 at 00:25

JeongSoo Na

votes

1 answer

Suggestions on Analyzing Protein Sequences Similarity

I want to write code to analyze short protein sequences and determine their similarity. I have no reference sequence but rather I want to write some sort of for loop to compare them all to each other to see how many duplicate sequences I have, as…

python r bioinformatics protein-database sequencing

asked Feb 07 '23 at 18:52

cosmomush

votes

0 answers

Getting Genebank info automatically in a python script

Is there a method in Biopython where I can get the entire genebank file just by inputting the accession number of the protein? From what I'm seeing, I would have to have the gb file locally, or scrape it from the genebank website. Is there a…

python bioinformatics biopython protein-database

asked Dec 09 '22 at 05:34

asal

Prev 1 2 3

…

9 10 Next