-4

​I am willing to write a program to extract amino acid sequence corresponding to the features of type "Region" as separate Fasta file and to list out the amino acids and positions with "Site" of site_type="phosphorylation".

WITHOUT USING Biopython PACKAGE.

(I have biopython code already which does the same thing)

File is below.

LOCUS       NP_005219               1210 aa            linear   PRI 15-MAR-2015
DEFINITION  epidermal growth factor receptor isoform a precursor [Homo
            sapiens].
ACCESSION   NP_005219
VERSION     NP_005219.2  GI:29725609
DBSOURCE    REFSEQ: accession NM_005228.3
KEYWORDS    RefSeq.
FEATURES             Location/Qualifiers
     source          1..1210
                     /organism="Homo sapiens"
                     /db_xref="taxon:9606"
                     /chromosome="7"
                     /map="7p12"
     Protein         1..1210
                     /product="epidermal growth factor receptor isoform a
                     precursor"
                     /EC_number="2.7.10.1"
                     /note="avian erythroblastic leukemia viral (v-erb-b)
                     oncogene homolog; cell proliferation-inducing protein 61;
                     cell growth inhibiting protein 40; proto-oncogene
                     c-ErbB-1; receptor tyrosine-protein kinase erbB-1"
     sig_peptide     1..24
                     /inference="COORDINATES: ab initio prediction:SignalP:4.0"
                     /calculated_mol_wt=2283
     mat_peptide     25..1210
                     /product="epidermal growth factor receptor isoform a"
                     /calculated_mol_wt=132013
     Region          57..168
                     /region_name="Recep_L_domain"
                     /note="Receptor L domain; pfam01030"
                     /db_xref="CDD:250307"
     Region          75..300
                     /region_name="Approximate"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Region          185..337
                     /region_name="Furin-like"
                     /note="Furin-like cysteine rich region; pfam00757"
                     /db_xref="CDD:250112"
     Site            229
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:21487020};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Region          231..274
                     /region_name="FU"
                     /note="Furin-like repeats. Cysteine rich region. Exact
                     function of the domain is not known. Furin is a
                     serine-kinase dependent proprotein processor. Other
                     members of this family include endoproteases and cell
                     surface receptors; cd00064"
                     /db_xref="CDD:238021"
     Region          361..481
                     /region_name="Recep_L_domain"
                     /note="Receptor L domain; pfam01030"
                     /db_xref="CDD:250307"
     Region          390..600
                     /region_name="Approximate"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Region          505..637
                     /region_name="GF_recep_IV"
                     /note="Growth factor receptor domain IV; pfam14843"
                     /db_xref="CDD:258980"
     Region          506..559
                     /region_name="FU"
                     /note="Furin-like repeats. Cysteine rich region. Exact
                     function of the domain is not known. Furin is a
                     serine-kinase dependent proprotein processor. Other
                     members of this family include endoproteases and cell
                     surface receptors; cd00064"
                     /db_xref="CDD:238021"
     Region          558..>598
                     /region_name="FU"
                     /note="Furin-like repeats. Cysteine rich region. Exact
                     function of the domain is not known. Furin is a
                     serine-kinase dependent proprotein processor. Other
                     members of this family include endoproteases and cell
                     surface receptors; cd00064"
                     /db_xref="CDD:238021"
     Region          634..677
                     /region_name="TM_ErbB1"
                     /note="Transmembrane domain of Epidermal Growth Factor
                     Receptor or ErbB1, a Protein Tyrosine Kinase; cd12093"
                     /db_xref="CDD:213054"
     Site            order(644..646,648..653,656..657)
                     /site_type="other"
                     /note="heterodimer interface [polypeptide binding]"
                     /db_xref="CDD:213054"
     Site            646..668
                     /site_type="transmembrane region"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            678
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphothreonine, by PKC and PKD/PRKD1.
                     {ECO:0000269|PubMed:10523301}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Region          688..704
                     /region_name="Important for dimerization, phosphorylation
                     and activation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            693
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphothreonine, by PKD/PRKD1.
                     {ECO:0000269|PubMed:10523301, ECO:0000269|PubMed:16083266,
                     ECO:0000269|PubMed:18691976, ECO:0000269|PubMed:20068231,
                     ECO:0000269|PubMed:3138233}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Site            695
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:18691976,
                     ECO:0000269|PubMed:3138233}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Region          704..1016
                     /region_name="PTKc_EGFR"
                     /note="Catalytic domain of the Protein Tyrosine Kinase,
                     Epidermal Growth Factor Receptor; cd05108"
                     /db_xref="CDD:270683"
     Region          712..968
                     /region_name="Pkinase_Tyr"
                     /note="Protein tyrosine kinase; pfam07714"
                     /db_xref="CDD:254379"
     Site            order(715..717,728..730,794..795,797,804..805,1009..1010)
                     /site_type="other"
                     /note="dimer interface [polypeptide binding]"
                     /db_xref="CDD:270683"
     Site            order(718..719,722..723,745,791,793,797,841..842,855,
                     876..880,885,889)
                     /site_type="active"
                     /db_xref="CDD:270683"
     Site            order(718..719,726,743,745,766,790..791,793,841..842,844,
                     855)
                     /site_type="other"
                     /note="ATP binding site [chemical binding]"
                     /db_xref="CDD:270683"
     Site            854..879
                     /site_type="other"
                     /note="activation loop (A-loop)"
                     /db_xref="CDD:270683"
     Site            order(876..880,885,889)
                     /site_type="other"
                     /note="polypeptide substrate binding site [polypeptide
                     binding]"
                     /db_xref="CDD:270683"
     Site            991
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:16083266,
                     ECO:0000269|PubMed:18669648, ECO:0000269|PubMed:20068231};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            995
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:18669648};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            998
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphotyrosine, by autocatalysis.
                     {ECO:0000269|PubMed:18669648,
                     ECO:0000269|PubMed:19563760}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Site            1016
                     /site_type="other"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Important for interaction with PIK3C2B; propagated
                     from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1016
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphotyrosine, by autocatalysis.
                     {ECO:0000269|PubMed:19563760}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Site            1026
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:16083266};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1039
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:18669648};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1041
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphothreonine. {ECO:0000269|PubMed:18669648};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1042
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:18669648};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1064
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:18669648,
                     ECO:0000269|PubMed:18691976, ECO:0000269|PubMed:20068231};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1069
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphotyrosine. {ECO:0000305|PubMed:22888118};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1070
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:3138233};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1071
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:3138233};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1081
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:18691976};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1092
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphotyrosine, by autocatalysis.
                     {ECO:0000269|PubMed:12873986}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Site            1110
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphotyrosine, by autocatalysis.
                     {ECO:0000269|PubMed:12873986, ECO:0000269|PubMed:2543678};
                     propagated from UniProtKB/Swiss-Prot (P00533.2)"
     Site            1166
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphoserine. {ECO:0000269|PubMed:18669648,
                     ECO:0000269|PubMed:18691976}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Site            1172
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphotyrosine, by autocatalysis.
                     {ECO:0000269|PubMed:17081983}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Site            1197
                     /site_type="phosphorylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Phosphotyrosine, by autocatalysis.
                     {ECO:0000269|PubMed:17081983, ECO:0000269|PubMed:18691976,
                     ECO:0000269|PubMed:19563760, ECO:0000269|PubMed:19836242,
                     ECO:0000269|PubMed:20068231}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     Site            1199
                     /site_type="methylation"
                     /experiment="experimental evidence, no additional details
                     recorded"
                     /note="Omega-N-methylarginine.
                     {ECO:0000269|PubMed:21258366}; propagated from
                     UniProtKB/Swiss-Prot (P00533.2)"
     CDS             1..1210
                     /gene="EGFR"
                     /gene_synonym="ERBB; ERBB1; HER1; mENA; NISBD2; PIG61"
                     /coded_by="NM_005228.3:247..3879"
                     /note="isoform a precursor is encoded by transcript
                     variant 1"
                     /db_xref="CCDS:CCDS5514.1"
                     /db_xref="GeneID:1956"
                     /db_xref="HGNC:HGNC:3236"
                     /db_xref="MIM:131550"
ORIGIN      
        1 mrpsgtagaa llallaalcp asraleekkv cqgtsnkltq lgtfedhfls lqrmfnncev
       61 vlgnleityv qrnydlsflk tiqevagyvl ialntverip lenlqiirgn myyensyala
      121 vlsnydankt glkelpmrnl qeilhgavrf snnpalcnve siqwrdivss dflsnmsmdf
      181 qnhlgscqkc dpscpngscw gageencqkl tkiicaqqcs grcrgkspsd cchnqcaagc
      241 tgpresdclv crkfrdeatc kdtcpplmly npttyqmdvn pegkysfgat cvkkcprnyv
      301 vtdhgscvra cgadsyemee dgvrkckkce gpcrkvcngi gigefkdsls inatnikhfk
      361 nctsisgdlh ilpvafrgds fthtppldpq eldilktvke itgflliqaw penrtdlhaf
      421 enleiirgrt kqhgqfslav vslnitslgl rslkeisdgd viisgnknlc yantinwkkl
      481 fgtsgqktki isnrgensck atgqvchalc spegcwgpep rdcvscrnvs rgrecvdkcn
      541 llegeprefv enseciqchp eclpqamnit ctgrgpdnci qcahyidgph cvktcpagvm
      601 genntlvwky adaghvchlc hpnctygctg pglegcptng pkipsiatgm vgalllllvv
      661 algiglfmrr rhivrkrtlr rllqerelve pltpsgeapn qallrilket efkkikvlgs
      721 gafgtvykgl wipegekvki pvaikelrea tspkankeil deayvmasvd nphvcrllgi
      781 cltstvqlit qlmpfgclld yvrehkdnig sqyllnwcvq iakgmnyled rrlvhrdlaa
      841 rnvlvktpqh vkitdfglak llgaeekeyh aeggkvpikw malesilhri ythqsdvwsy
      901 gvtvwelmtf gskpydgipa seissilekg erlpqppict idvymimvkc wmidadsrpk
      961 freliiefsk mardpqrylv iqgdermhlp sptdsnfyra lmdeedmddv vdadeylipq
     1021 qgffsspsts rtpllsslsa tsnnstvaci drnglqscpi kedsflqrys sdptgalted
     1081 siddtflpvp eyinqsvpkr pagsvqnpvy hnqplnpaps rdphyqdphs tavgnpeyln
     1141 tvqptcvnst fdspahwaqk gshqisldnp dyqqdffpke akpngifkgs taenaeylrv
     1201 apqssefiga
//
Community
  • 1
  • 1
Siva Shanmugam
  • 662
  • 9
  • 19
  • Do you have a question? If so, what is your question? – Robᵩ Oct 14 '15 at 19:55
  • @Robᵩ I am blank, i dont know what to and how to use.. Give me some idea or flow of program. (searching the string, matching, storing it to a file, kind of) – Siva Shanmugam Oct 14 '15 at 20:04
  • 1
    Welcome to StackOverflow. Please read and follow the posting guidelines in http://stackoverflow.com/help/mcve, http://stackoverflow.com/help/on-topic, and http://stackoverflow.com/help/dont-ask. This is not a design or coding service. You have not given us a description of the problem you're trying to solve: you've used several undefined terms. You haven't shown any code that is giving you trouble. We can't help until you provide enough information to properly reduce, reproduce, and explain the problem. – Prune Oct 14 '15 at 20:10
  • @Prune Added the the file. Please check. – Siva Shanmugam Oct 14 '15 at 20:23
  • for this, is more easy using `biopython` .... why not use a library made for parser `genbank` files ? – Jose Ricardo Bustos M. Oct 14 '15 at 20:27
  • @JoseRicardoBustosM. Yeah you are right. I have the code for Biopython(Link is available in post) But i am thinking that it is possible and willing to learn how. – Siva Shanmugam Oct 14 '15 at 20:30
  • @SivaShanmugam you say literally "WITHOUT USING Biopython PACKAGE." ..... why? – Jose Ricardo Bustos M. Oct 14 '15 at 20:37
  • @JoseRicardoBustosM. I am just curious, if package is not there, how program flow will be.. For knowledge purpose. – Siva Shanmugam Oct 14 '15 at 20:56
  • 1
    Without BioPython, it is simply text parsing. You would use pattern matching (probably with regexes) to identify the appropriate features, and then more pattern matching to find the sequence start and end. There is nothing tricky about it, it is simply tedious and boring to do each feature step by step. Since you're asking to skip the simple, efficient way to do this and rather to use the boring, mindless approach, you're not likely to get much interest in the question. If you're interested in learning how to do it, look at the source code for the biopython genbank parser. – iayork Oct 15 '15 at 11:42
  • @iayork Thank you very much. Looking at the source code is really good idea. Thanks :) – Siva Shanmugam Oct 15 '15 at 13:18

1 Answers1

0

I recommend using biopython

from Bio import SeqIO
file = "file.gb"
#gb = next(SeqIO.parse(open(file), "genbank")) in python 3
gb = SeqIO.parse(open(file), "gb").next()
phosphorylation_list = [f for f in gb.features if f.type=="Site" and 
                       "phosphorylation" in f.qualifiers['site_type']]

for f in phosphorylation_list:
    print((int(f.location.start), int(f.location.end)))

you get,

(228, 229)
(677, 678)
(692, 693)
(694, 695)
(990, 991)
(994, 995)
(997, 998)
(1015, 1016)
(1025, 1026)
(1038, 1039)
(1040, 1041)
(1041, 1042)
(1063, 1064)
(1068, 1069)
(1069, 1070)
(1070, 1071)
(1080, 1081)
(1091, 1092)
(1109, 1110)
(1165, 1166)
(1171, 1172)
(1196, 1197)
Jose Ricardo Bustos M.
  • 8,016
  • 6
  • 40
  • 62