Questions tagged [protein-database]

A file containing protein sequences together with corresponding metadata

Classical protein-databases are text files containing a large number of protein-sequences.

Protein sequences are represented as strings of uppercase letters, each corresponding to a different aminoacid. Each protein sequence is preceeded by a header line containing metadata (protein reference number, name, description...).

The standard fasta format looks like:

>P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS
YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD
AGEGEN
>P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW
YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF
YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE
EQNKEALQDVEDENQ
>.........................................................

A great amount of work in Bioinformatics relates with storing (annotating), searching and analyzing the sequences in these databases.

145 questions
1
vote
1 answer

How can I print out multiple similar patterns I have matched in perl?

So I got a uniprot flat file in fasta format (https://www.uniprot.org/uniprot/P30988.txt here is the link but it's supposed to be able to work with any uniprot flat file) and I want to essentially take the transmembrane parts from the sequence and…
Nickmofoe
  • 99
  • 6
1
vote
1 answer

How can I find a protein sequence from a FASTA file using perl?

So I have an exercise in which I have to print the three first lines of a fasta file as well as the protein sequence. I have tried to run a script I wrote, but cygwin doesnt seem to print the sequence out. My code is as follows: #!usr/bin/perl open…
Nickmofoe
  • 99
  • 6
1
vote
2 answers

Simple PDB Library

I am looking for a simple C++ library for extracting atom coordinates from a pdb file. Most I've come across do too much for my simple needs, making them unnecessarily complex.
moinudin
  • 134,091
  • 45
  • 190
  • 216
1
vote
2 answers

How can I parse alternative atom information in a PDB file?

I am trying to parse PDB files. Say, a PDB file has the following data: ATOM 33 N ATHR A 2 4.935 -11.632 15.046 0.74 2.95 N ATOM 34 N BTHR A 2 5.078 -11.406 15.180 0.31 2.78 N ATOM 35 CA…
user366312
  • 16,949
  • 65
  • 235
  • 452
1
vote
1 answer

How to index a text field or make it searchable in MySQL?

I have stored about 7 million biological protein sequences in text field of MySQL table (using InnoDB storage engine and latin1_swedish_ci collation). Sequences stored in MySQL are simple combinations of English alphabets in uppercase. Like…
Rashid
  • 1,244
  • 3
  • 13
  • 29
1
vote
1 answer

MD analysis using Python

I am trying to run this command: python3 traj_orientation-group.py -g fnIII-9_ps20_Nchain6_T298_nw.gro -x fnIII-9_ps20_Nchain6_T298_nw.xtc -o fnIII-9_ps20_Nchain6_run1.phrsn-orientation --protein_res_start 1 --protein_res_stop 89 –group 51 52 53 54…
1
vote
1 answer

Biopython : how to extract only relevant atom and save a pdb file (not locally)?

Using Biopython. I have a list of atoms. rep_atoms = [CA, CB, CD3] (Carbon atoms). I want to save only these from any given PDB file. I don't want to save it locally; I want it to save in the memory (Lots of iteration). I have arrived at the code…
1
vote
1 answer

Confused with psfgen module of VMD

I am fairly new to VMD and programming in general. I need to combine two pdb files of subunits into a combined pdb and psf file with both subunits. I used the Namd tutorial and used two pdb files named BChain270VerCTrue.pdb and barn_noH2o_ChainD.pdb…
ChrisG
  • 11
  • 3
1
vote
0 answers

confused about the annotation on pdb website

I don't understand the C [auth B] meaning in the Chain columns. Also, i am wondering why the name of molecule is so long in the second figure. Thanks for advance
mark Lan
  • 41
  • 1
1
vote
1 answer

UE4, C++, fscanf, match strings with specifiers

I am new to Unreal Engine, coming from Unity and as well new to C++. I am trying to import PDB files directly into the engine by using fscanf. The section of the PDB file which I am attempting to capture is shown below: ATOM 15 H1 ATHR A 1 …
Mortamass
  • 21
  • 1
1
vote
1 answer

Keeping format of a pdb file with conditionals

I'm new in awk, and I'm trying to modify column 3 (with numeration about NR) if column 1 has the word HETATM. My input file is: HETATM 25 O UNL 1 86.047 83.059 103.165 1.00 0.00 O HETATM 26 N UNL 1 87.071 …
1
vote
1 answer

how to embedd nglview in pyqt5 window

I am trying to implement a application using pyqt5 where I wish to show protein structure using nglview in the application window, is it possible to do so. If yes, can anybody suggest how to embedd the nglview in pyqt5 or any similar package like…
Prashik
  • 33
  • 5
1
vote
2 answers

What changes do I need to make in this python code to convert DNA sequences to protein?

I have to first find codon "ATG" then once it finds the codon it should translate using the dictionary. I am currently getting no output. I am still inexperienced in python so have trouble writing code with proper syntax def translate(line, table): …
1
vote
2 answers

Awk replace column AFTER matched line

I have a PDB file that is returned from a receptor/ligand docking prediction. I don't know why the authors of the program named the chains "A" for both receptor and ligand, but I want to change it. This should be a basic thing that I want to do…
Brian Wiley
  • 485
  • 2
  • 11
  • 21
1
vote
1 answer

Time series data for water molecules from its dcd file

I am trying to make a file which contain time series data of water molecules from dcd file. Is it possible to generate this data using any of MDAnalysis module or function? Or is there any python script to generate this file? I need to generate this…
1 2
3
9 10