0

How to create a sequence logo for aligned DNA sequences? For the given sequences in Kevin Murphy book (chapter 2, figure 2.5), I am deriving logo using this wiki_link I am not getting expected results.

DNA Sequences:

  1. a t a g c c g g t a c g g c a
  2. t t a g c t g c a a c c g c a
  3. t c a g c c a c t a g a g c a
  4. a t a a c c g c g a c c g c a
  5. t t a g c c g c t a a g g t a
  6. t a a g c c t c g t a c g t a
  7. t t a g c c g t t a c g g c c
  8. a t a t c c g g t a c a g t a
  9. a t a g c a g g t a c c g a a
  10. a c a t c c g t g a c g g a a
NKR
  • 17
  • 4
  • What results did you expect, what results did you get, and what does a [Minimal, Complete, Verifiable Example](https://stackoverflow.com/help/mcve) of your program look like? Note that despite the claim in the caption the example logo in figure 2.5(b) of that PDF clearly does not represent the sequences shown in 2.5(a), so you shouldn't expect your program to produce that logo. The text also does not match: it says column 7 is all *G*s but in fact that column contains a *T*. Apparently this is a known error in the book. See http://www.cs.ubc.ca/~murphyk/MLbook/errata.html for more errors. – ottomeister Jan 29 '18 at 23:26
  • I have computed the size of a character based on relative frequency. For instance, 5th sequence have Cs with full probability and hence, the corresponding character should be a big C in sequence logo. Similarly, 13th position be a big G. But not the 7th character. Thanks for sharing the errata link. :) – NKR Jan 30 '18 at 05:05

2 Answers2

1

In case you don't need to dev your own version:

There is a python library to solve this problem.

https://pypi.python.org/pypi/weblogo

or the web version

http://weblogo.berkeley.edu/logo.cgi

Colin Anthony
  • 1,141
  • 12
  • 21
0

You can do it using weblogo as added above, here is a little code to do it in python

from Bio.Seq import Seq
from Bio import motifs
instances = df['binding'] #just input the list of DNA sequences 
m = motifs.create(instances)
m.weblogo('logo.png')

Here you have to provide instances as list of DNA sequences and the result will be saved as logo.png or you may change the png to jpeg or tiff as you may want it.

Kay
  • 90
  • 8