0

I have a file containing multiple protein amino acid sequences. I am interested in visually presenting the location of amino acid motifs PQG, QQG and AQG in the sequence in the form of a heatmap in R. Every location of the motifs should be represented as a differently coloured barcode. Example sequence follows,

>cat archae.fasta
>A0A075FKY9
MRRITLPKGIEISGKSPFGTSSGVYEVVISISENKPSDASIASNSFNSLWKDNFHLRVSG
DHFSQVLGSGSNPIPDNVFDLGSVWIVIQDQFSPVHTSFQFNISPSNQPASKPEEKPRRV
ETSTVKRTITKPGRLGPPGGKGPRGPSGYPGIKGDKGGRGPTGDKGDKGDKGIPGPQGEK
GKTGQAGDKGDKGITGPPGEKGPRGSTGPPGDKGDKGVQGKIGDKGLTGTTGPVGDKGTQ
GPQGPPGGKGLTGIPGPQGDKGTKGPLGPVGERGPTGKPGESGLQGPQGIQGAQGERGHS
GPPGPQGEKGLIGDKGEIGERGTRGPPGPPGEKGSQGGMSDEGKRLIKELLELLASKNII
STEEQIKLTSYLY
>A0A075FMB0
MPGDKGTSGIRGVPGDKGDKGPQGPPGDKGLTGPVGVPGEKGPTGTSGVQGEKGIQGTPG
PLGEKGVTGPAGDKGPIGPPGPAGSKGITGPAGPLGDKGTIGPPGPLGDKGSKGPEGPTG
DKGPQGPQGPAGAKGLTGVPGPQGEKGEKGPLGPIGEKGTTGATGPPGDKGPQGPQGTQG
ERGTTGPSGEKGPQGPQGIQGPQGERGPTGPIGSIGETGPIGGQGPVGPAGPRGPPGPPG
EKGPSGGMSSEQKAIFKELLEILTTKEIISTEEQIKLMSYLY
>A0A075G2F8
MPKGIEISGKSPFGTSSGVYEVKISVSQSKPSNDSITNNSFNSLWKDNFHLRVSEGYFSE
ILGSDSNPIPDNVFDLGSVWIVIQDQFSPVHTSFEFKISQDSKSIPTPKEKPRRVDTSSV
KKVRTIPTRQGPTGSKGIPGERGYPGTKGEKGGKGPTGDKGDKGDKGVPGPQGEKGKTGS
TGDKGDKGITGSPGEKGPRGPTGPPGDKGDKGVQGKIGDKGLTGTTGPIGDKGPQGPQGP
PGDKGLTGIPGPQGDKGGRGTIGPSGERGPTGRPGESGLQGPQGIQGPQGERGHSGPPGP
QGEKGIIGDKGEIGERGPRGPPGPPGEKGSQGGMSDDGKILFKELLQLLASKNIISTEEQ
IKLTSYLY
>A0A075GLR7
MPKGIEISGKSPFGTSSGVYEVRISISQDKPSDDGITNNSFNSLWKDNFHLRVSEGHFSE
ILGTDSNPIPADVFGLGSIWIIIQDQFSPVHTSFELHISQDNKLLSVPQGKPKPVNATKV
QKVRAQPGRPGPQGEKGGKGPAGYPGTRGEKGGRGITGDKGDKGDKGIPGPQGEKGIKGP
AGDKGDKGITGPVGPSGPKGSTGPPGDKGDKGVQGKSGDKGLTGVTGPLGDKGTQGPQGP
PGPKGLTGVPGPQGDKGMRGSIGPAGERGPTGKPGDPGLPGPQGIQGPQGERGHTGPQGP
PGEKGLVGDKGEIGDRGPRGPPGPPGEKGSQGGMSEESKRLVKELLELLASKNIISTEEQ
IKLTSYLY

As an example, please see a two-colour heatmap. I need something similar but with three colours corresponding to each of the three motifs.

image.png

I looked at the heatmaply package but the biggest problem is that my matrix is not numeric. Is there way to make heatmap using text strings in R?

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
Jalan
  • 69
  • 8
  • Hi Jalan, you need to give us the x and y axis for your heatmap. Also what is "motifs"? – Stephan Jun 03 '22 at 06:50
  • This type of domain-specific question is better suited to https://bioinformatics.stackexchange.com/ (perhaps there's a software package to create your motif heatmap; something like https://academic.oup.com/bioinformatics/article/38/9/2624/6535228? The folks over there would know) – jared_mamrot Jun 03 '22 at 06:54
  • @Stephan: Apologies. Motif is defined as any three character string that matches the general pattern either PQG, QQG and AQG in the uploaded sequence. y-axis is the fasta code appended after > (A0A075FKY9 for example). x-axis would depend on how the data is converted into a matrix. Since – Jalan Jun 03 '22 at 08:57
  • @jared_mamrot: Thanks. I will post it there. The link looks very interesting, thanks. I will post if it works. – Jalan Jun 03 '22 at 09:10
  • You could color code your letters as numbers - and then use heatmaply. – Tal Galili Jun 28 '22 at 09:04
  • @TalGalili: I didn't get you. Should I label motifs PQG, QQG and AQG as (say) 1, 2 and 3? – Jalan Jun 29 '22 at 07:41

0 Answers0