Questions tagged [protein-database]

A file containing protein sequences together with corresponding metadata

Classical protein-databases are text files containing a large number of protein-sequences.

Protein sequences are represented as strings of uppercase letters, each corresponding to a different aminoacid. Each protein sequence is preceeded by a header line containing metadata (protein reference number, name, description...).

The standard fasta format looks like:

>P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS
YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD
AGEGEN
>P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1
MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASW
YYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVF
YYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGE
EQNKEALQDVEDENQ
>.........................................................

A great amount of work in Bioinformatics relates with storing (annotating), searching and analyzing the sequences in these databases.

145 questions
1
vote
4 answers

Sort nested hash with multiple conditions

I'm somewhat new to perl programming and I've got a hash which could be formulated like this: $hash{"snake"}{ACB2} = [70, 120]; $hash{"snake"}{SGJK} = [183, 120]; $hash{"snake"}{KDMFS} = [1213, 120]; $hash{"snake"}{VCS2} = [21,…
1
vote
1 answer

How to select only RNA with Hetero atoms from pdb file with python?

I'm trying to separate RNA from protein in a complex protein/RNA PDB file and I want all RNA info with the hetero atoms in between the bases BUT without H20 etc. In short I want RNA part of pdb file without discontinuous lines. I managed to separate…
Raph
  • 11
  • 2
1
vote
1 answer

How to save a molmap/density map on Chimera in mrc format using python script or Command line?

I know how to save molmap manually, but failed to do it using script; When I used command save or export, the file is saved as .py or .x3d, not in mrc. What should I do to correctly save the file using script or command line? Thanks a lot.
1
vote
1 answer

TypeError when creating PDB file using Biopython's PDBIO, only with certain files

I am writing a script that renumbers protein structures (CIF files) and then saves them (PDB files: Biopython does not have a CIF saving function). For most of the files I use, it works. But for files like 6ek0.pdb, 5t2c.pdb, and 4v6x.pdb I keep…
Janne B
  • 13
  • 5
1
vote
1 answer

How to find percentage of occurrence of characters in an argument?

what should i do to calculate percentage of occurrence of characters in an argument if the data…
1
vote
2 answers

How to paste values of specific columns of a file into another command?

I want to use the fastacmd to extract specific regions of fasta sequences. To do that I need to put the name of the fasta file -d, the name of the sequence -s and the position of the sequence to extract -L. For example: fastacmd -d OAP11402.1.fa -s…
1
vote
1 answer

How to move protein coordinates with respect to a reference frame

I have a PDB file '1abz' (https://files.rcsb.org/view/1ABZ.pdb), which is containing the coordinates of a protein structure. Please ignore the lines of the header remarks, the interesting information starts at line 276 which says 'MODEL 1'. I would…
Cave
  • 201
  • 1
  • 4
  • 14
1
vote
1 answer

How can I use mmCIF format instead PDB in BioJava?

I have a small problem... I know that for download a PDB structure using BioJava I should use Structure s = StructureIO.getStructure("code"); What I should do to use mmCIF file instead?
fafnir1990
  • 185
  • 2
  • 16
1
vote
2 answers

sorting data using key in python

I got a data format like: ATOM 124 N GLU B 12 ATOM 125 O GLU B 12 ATOM 126 OE1 GLU B 12 ATOM 127 C GLU B 12 ATOM 128 O GLU B 14 ATOM 129 N GLU B 14 ATOM 130 OE1 GLU B 14 ATOM 131 OE2 GLU B 14 ATOM 132 CA GLU B 14 ATOM 133 C GLU B 15 ATOM 134 CA…
diffracteD
  • 758
  • 3
  • 10
  • 32
1
vote
2 answers

i have a protein sequence file i want to count trimers in it using sed or grep

I have a protein sequence file in the following format uniprotID\space\sequence sequence is a string of any length but with only 20 allowed letters i.e. ARNDCQEGHILKMFPSTWYV Example of 1 record Q5768D AKCCACAKCCAC I want to create a csv file in…
samar
  • 15
  • 8
1
vote
1 answer

Does class DSSP of biopython gives the relative solvent accessibility value of amino acids?

I would like to have relative solvent accessibilities of amino acids in a protein, presently, using DSSP module of biopython. I am not sure if the output has rsa (relative solvent accessibility) or is it needed to be calculated? Any help would be…
SChaurasia
  • 11
  • 1
1
vote
1 answer

BLAST: blastpgp not producing an output file, unsure if using database flag correctly

Question 1: I'm running the following: blast-2.2.26/bin/blastpgp -i protein.fasta -j 5 -o file -d nr where protein.fasta is a fasta file containing a single protein sequence. This produces no output and the -o file is empty. Question 2: I was able…
user1539097
  • 109
  • 3
  • 8
1
vote
2 answers

Replace text using awk and sed at every 2-3-4 lines for pdb file

I have a pdb text file with about 200 000 rows. Every rows looks like this : COMPND SOURCE HETATM 1 CT 100 1 -23.207 17.632 14.543 HETATM 2 CT 99 1 -22.069 18.353 15.280 HETATM 3 OH 101 1 -21.074 …
Grego
  • 59
  • 2
  • 9
1
vote
3 answers

OpenGL code to render ribbon diagrams for protein

I am looking to render ribbon diagrams of proteins using OpenGL and C++. Does anyone know if any open source code for this already exists, or if there are good guides to do this? If not, I'd prefer to figure it out myself ;) but I didn't want to…
eipxen
  • 218
  • 1
  • 9
1
vote
3 answers

Extract Columns from a Protein Data Bank (PDB) Text File

I want to make a plot with Matplotlib in Python and therefore read some data from a PDB-file (protein data bank). I want to extract every column from the file and store these columns in separate vectors. The PDB-file consists of columns with both…
Djamillah
  • 289
  • 2
  • 6
  • 16