4

I'm trying to draw a matrix according to global alignment algorithm (or Needleman–Wunsch algorithm) in Python.

I don't know if matplotlib is the best tool for this case. I tried to use Bokeh but the structure was so difficult to fit a matrix as I wanted.

I'm using Bio.SeqIO (the standard Sequence Input/Output interface for BioPython) to store two sequences.

I what to get a result similar to this image:

enter image description here

Is that possible in Matplotlib? How can I do that?

UPDATE

Finally I was able to construct the algorithm from the answer given by ImportanceOfBeingErnest. Here is the result:

enter image description here

Here is the gist for this implementation: plot_needleman_wunsch.py

And here is the whole project (Work in progress): bmc-sequence-alignment

Kevin Hernández
  • 704
  • 10
  • 25
  • if you add your two sequences of data, I'll give it a shot. – Max Power Apr 29 '17 at 03:48
  • `seq_alpha = Seq("GATCCA")` and `seq_beta = Seq("GTGCCT")` – Kevin Hernández Apr 29 '17 at 04:12
  • sorry I'm not a biostats guy. can you reproduce where the numbers and arrows (color and direction) come from? matplotlib definitely doesn't have something built in to figure that out. but if you can provide that, I can try to plot this – Max Power Apr 29 '17 at 04:14
  • I'm not a biostats either, but I'm going to try to explain with examples. The colors of the arrows do not interest me, but the directions. To understand where the arrows come from please read the sections **3.1.1** and **3.1.2** from this [tutorial](http://etutorials.org/Misc/blast/Part+II+Theory/Chapter+3.+Sequence+Alignment/3.1+Global+Alignment+Needleman-Wunsch/). Briefly, I have two arrays of size m and n. The matrix is ​​(m + 1 x n + 1). Then fill the first row and the first column with "gaps". And to get the directions ask what is the maximum value between the diagonal, the left and up – Kevin Hernández Apr 29 '17 at 05:26
  • "To understand where the arrows come from please read the sections 3.1.1 and 3.1.2 from this tutorial." I'm not interested in reading up on how to reproduce your data. If you provided the data you wanted plotted, someone would probably plot it for you. – Max Power Apr 29 '17 at 05:31
  • Ok, you are right. Maybe you know how to do this: I draw an arrow diagonally horizontally or vertically from the number that is in the middle of the cell. I tried to do it, but I did not succeed. This is my try: http://imgur.com/wI1pS5Q – Kevin Hernández Apr 29 '17 at 06:02
  • if you made that imgur plot with matplotlib, you should post that code, along with the data used to generate it. although the arrows have no color and the cell values are 0/X instead of numbers like your first plot, so it still wouldn't be enough to reproduce the original plot you said you want. – Max Power Apr 29 '17 at 06:17

1 Answers1

8

There is no clear description of the algorithm to place the arrows in the question; therefore this answer focusses on the way to procude a similar plot in matplotlib.

enter image description here

The idea here is to place the numbers at integer positions in the plot and draw minor gridlines at n+0.5 to obtain the table-like appearance. The arrows are drawn as annotations between positions defined in a 4-column array (first 2 column: x and y of the start of the arrow, third and fourth column: x,y of arrow's end).

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np;np.random.seed(5)
plt.rcParams["figure.figsize"] = 4,5
param = {"grid.linewidth" : 1.6,
         "grid.color"     : "lightgray",
         "axes.linewidth" : 1.6,
         "axes.edgecolor" : "lightgray"}
plt.rcParams.update(param)

#Data
headh = list("GATCCA")
headv = list("GTGCCT")

v = np.zeros((7,7), dtype=int)
v[1:,1:] = np.random.randint(-2,7, size=(6,6))

arrows = np.random.randint(0,v.shape[1], size=(14,4))
opt = np.array([(0,1),(1,0),(1,1)])
arrows[:,2:] = arrows[:,:2] + opt[np.random.randint(0,3,size=14 )]

arrowsb = np.random.randint(0,v.shape[1], size=(7,4))
optb = np.array([(0,1),(1,0),(1,1)])
arrowsb[:,2:] = arrowsb[:,:2] + optb[np.random.randint(0,3,size=7 )]

#Plot
fig, ax=plt.subplots()
ax.set_xlim(-1.5, v.shape[1]-.5 )
ax.set_ylim(-1.5, v.shape[0]-.5 )
ax.invert_yaxis()
for i in range(v.shape[0]):
    for j in range(v.shape[1]):
        ax.text(j,i,v[i,j], ha="center", va="center")
for i, l in enumerate(headh):
    ax.text(i+1,-1,l, ha="center", va="center", fontweight="semibold")
for i, l in enumerate(headv):
    ax.text(-1,i+1,l, ha="center", va="center", fontweight="semibold")

ax.xaxis.set_minor_locator(ticker.FixedLocator(np.arange(-1.5, v.shape[1]-.5,1)))
ax.yaxis.set_minor_locator(ticker.FixedLocator(np.arange(-1.5, v.shape[1]-.5,1)))
plt.tick_params(axis='both', which='both', bottom='off', top='off', 
                left="off", right="off", labelbottom='off', labelleft='off')
ax.grid(True, which='minor')


arrowprops=dict(facecolor='crimson',alpha=0.5, lw=0, 
                shrink=0.2,width=2, headwidth=7,headlength=7)
for i in range(arrows.shape[0]):
    ax.annotate("", xy=arrows[i,2:], xytext=arrows[i,:2], arrowprops=arrowprops)
arrowprops.update(facecolor='blue')
for i in range(arrowsb.shape[0]):
    ax.annotate("", xy=arrowsb[i,2:], xytext=arrowsb[i,:2], arrowprops=arrowprops)
plt.show()
ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712