I am trying to make a function in R which could calculate the frequency of each codon. We know that methionine is an amino acid which could be formed by only one set of codon ATG so its percentage in every set of sequence is 1. Where as Glycine could be formed by GGT, GGC, GGA, GGG hence the percentage of occurring of each codon will be 0.25. The input would be in a DNA sequence like-ATGGGTGGCGGAGGG and with the help of codon table it could calculate the percentage of each occurrence in an input.
please help me by suggesting ways to make this function.
for example, if my argument is ATGTGTTGCTGG then, my result would be
ATG=1
TGT=0.5
TGC=0.5
TGG=1
Data for R:
codon <- list(ATA = "I", ATC = "I", ATT = "I", ATG = "M", ACA = "T",
ACC = "T", ACG = "T", ACT = "T", AAC = "N", AAT = "N", AAA = "K",
AAG = "K", AGC = "S", AGT = "S", AGA = "R", AGG = "R", CTA = "L",
CTC = "L", CTG = "L", CTT = "L", CCA = "P", CCC = "P", CCG = "P",
CCT = "P", CAC = "H", CAT = "H", CAA = "Q", CAG = "Q", CGA = "R",
CGC = "R", CGG = "R", CGT = "R", GTA = "V", GTC = "V", GTG = "V",
GTT = "V", GCA = "A", GCC = "A", GCG = "A", GCT = "A", GAC = "D",
GAT = "D", GAA = "E", GAG = "E", GGA = "G", GGC = "G", GGG = "G",
GGT = "G", TCA = "S", TCC = "S", TCG = "S", TCT = "S", TTC = "F",
TTT = "F", TTA = "L", TTG = "L", TAC = "Y", TAT = "Y", TAA = "stop",
TAG = "stop", TGC = "C", TGT = "C", TGA = "stop", TGG = "W")