Questions tagged [protein-database]

A file containing protein sequences together with corresponding metadata

Classical protein-databases are text files containing a large number of protein-sequences.

Protein sequences are represented as strings of uppercase letters, each corresponding to a different aminoacid. Each protein sequence is preceeded by a header line containing metadata (protein reference number, name, description...).

The standard fasta format looks like:

>P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS
YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD
AGEGEN
>P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW
YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF
YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE
EQNKEALQDVEDENQ
>.........................................................

A great amount of work in Bioinformatics relates with storing (annotating), searching and analyzing the sequences in these databases.

145 questions
0
votes
1 answer

File is not shown in the folder generated via tcl script

I am trying to write a file using tcl scripting (Via VMD). when I type command "dir" on tk/tcl console, it shows file name which I am trying to generate. But when I tried to open that file manually in that working directory folder, it is not even…
iqra khan
  • 45
  • 1
  • 6
0
votes
1 answer

trying to find out permeation events in each pore of AQP protein embedded in lipid bilayer through VMD

My project is based on MD simulation analysis on a system containing water box and lipid bilayer containing Aquaporin embedded in it. Simulations of timestep 150 ns is performed on this system to study the analysis of water permeation and flow…
iqra khan
  • 45
  • 1
  • 6
0
votes
1 answer

How do I correctly add a chain ID to my pdb file?

I am trying to conduct some analysis with my single-chain PDB file (766 residues long), but it requires a chain ID. Currently, there isn't one. Here is a snippet of the pdb file: ATOM 1 N MET 1 -69.269 78.953 -91.441 1.00 0.00 …
skhan21
  • 11
  • 6
0
votes
2 answers

Script for automating online tool query

So I had a number of amino acid sequence strings that I wanted to use as input into a tool that studies its interactions with certain components of the human immune system (http://www.cbs.dtu.dk/services/NetMHCcons/). I wanted to ask what, if any,…
0
votes
0 answers

Map datasets from excel and R

I am trying to create master data for interaction analysis of proteins, where I am using STRING database from R and external dataset present in excel(https://drive.google.com/file/d/1aJisbhWyqUFcx_wIBMxcDtw5fMIE-z5d/view?usp=sharing) I would greatly…
0
votes
1 answer

Using "findall" to find a sequence motif for a protein sequence

I have a program that needs to take user input to find a FASTA file containing a protein sequence (and give an error if the file can't be found), then scan through the sequence and find these four-letter sequences that follow the following rules:…
0
votes
0 answers

How to compare two txt files and then apply changes in one of them

I am trying to merge two text-format (PDB) files. One (bigger one) contains full set of data describing the protein, second one contains very small set of data changing just small part (set of coordinates). Example: Basic file (part): ATOM 605 …
0
votes
1 answer

protein predicting by modeller and python

i am trying to run a python script that is written by salilab as a tutorial for modeller software. there is a python file in this tutorial with this code: import pylab import modeller def r_enumerate(seq): """Enumerate a sequence in reverse…
0
votes
2 answers

How to fix "unambiguous redirection" and "unknown option for the `s' command: for sed

I'm trying to undergo pdbqt-flexible files merge into one pdb using following script: http://prosciens.com/prosciens/oldproscienssarl/files/flexrigidpdbqt2pdb_template.sh Problematic fragment: Let's merge the files First we clean up the model…
MadEye
  • 19
  • 3
0
votes
0 answers

Proteins with one SS bond?

I would like to find proteins with exactly one SS bond? Is there a database where I can search this? I've tried advanced search on https://www.rcsb.org/, but no such option, at least i could not find it.
Jake B.
  • 435
  • 3
  • 13
0
votes
1 answer

Unable to run Porter5: generating `.flatpsi` file instead of `.psi`

I am trying to use Porter5 to run protein secondary structure prediction on a FASTA file containing a bunch of protein sequences. I am using a Linux machine. For starters, I decided to try using the example file that gets downloaded along with…
0
votes
1 answer

Proteomics: Create a MSnSet class file with MSnbase

I want to create a MSset file (proteomics data, data corresponds to spectral counts) but I get error messages and I am stuck (after reading manuals, helps, forums, etc). You can get my files…
SkyR
  • 185
  • 1
  • 9
0
votes
1 answer

R Proteomics: issues with input file "ExpressionSet" : Processing information: msmsTest package

Updated Question: I want to use the msmsTest package for statistics of my proteomics data (which is spectral counts type). However, I have a message error when importing the file with the commands: e <- pp.msms.data(myStackoverflowexample) Error in…
SkyR
  • 185
  • 1
  • 9
0
votes
1 answer

retrieve 13mer peptide sequence from uniprotID and specific residue

I have a list of UniprotIDs with a corresponding residue of interest (e.g. Q7TQ48_S442). I need to retrieve the +/-6 residues around the specific site within the protein sequence(in the example, the sequence I need would be DIEAEASEERQQE). Can you…
0
votes
1 answer

using ngram in clustering protein data (ngram.NGram.compare equivalent in R)

There is some sequence data to be compared. The expected output is the distance matrix which shows how similar each sequence is to the others. Previously, I used ngram.NGram.compare in Python and now I want to switch to R. I found ngram and biogram…
Hadij
  • 3,661
  • 5
  • 26
  • 48