2

I have a matrix of size 500 X 28000, which contains a lot of zeros in between. But let us consider a working example with the matrix A:

A = [[0, 0, 0, 1, 0],
    [1, 0, 0, 2, 3],
    [5, 3, 0, 0, 0],
    [5, 0, 1, 0, 3],
    [6, 0, 0, 9, 0]]

I would like to plot a heatmap of the above matrix, but since it contains a lot of zeros, the heatmap contains almost white space as seen in the figure below.

How can I ignore the zeros in the matrix and plot the heatmap?

Here is the minimal working example that I tried:

im = plt.matshow(A, cmap=pl.cm.hot, norm=LogNorm(vmin=0.01, vmax=64), aspect='auto') # pl is pylab imported a pl
plt.colorbar(im)
plt.show()

which produces:

enter image description here

as you can see it is because of the zeros the white spaces appear.

But my original matrix of size 500X280000 contains a lot of zeros, which makes my colormap almost white!!

Rangooski
  • 825
  • 1
  • 11
  • 29
  • 1
    I am not convinced removing data just because the visualisation isn't clear is ever the right course of action. Have you considered grouping the data, or looking for a different type of plot? It might help if you could tell us more about the nature of the data you are working with. – Nelewout Aug 05 '16 at 13:34
  • I could not think of anything else other than a colorplot to visually represent my data. If there is any other way of representing it, do let me know. – Rangooski Aug 05 '16 at 13:35
  • You could try to use hierarchical clustering before plotting the heatmap. – GWW Aug 05 '16 at 13:37
  • @GWW Could you explain bit more ? – Rangooski Aug 05 '16 at 13:40
  • `norm=np.LogNorm(vmin=0.01, vmax=64)`. Where do you get LogNorm from? If you remove it, you get black colors for the zeroes instead. Why are you using a logarithmic scale? Is it needed? Because of log 0, I mean... – Luis Aug 05 '16 at 13:43
  • 2
    What do you mean exactly by 'ignoring'? Are you saying that the non-zero elements are so rare and are not visible because one element does not even occupy a single pixel for the 500x280000 data set? If so, how about plotting a fixed-size marker at the positions of non-zero elements, having readers understand that the values of the other elements not at the center of the markers are all zero? – norio Aug 05 '16 at 13:56
  • @norio: yes the non-zero elements are rare in each array. Could you please elaborate on the `plotting a fixed-size marker at the positions of non-zero elements` as to how I can achieve it. – Rangooski Aug 05 '16 at 13:59
  • O.K. Instead of `matshow`, how about using, for example, `plot(idxrow, idxcol, mfc=color_valmat, mec=color_valmat, markersize=10)`, where `idxrow` and `idxcol` are vectors of 140,000,000(=500*280000) elements, containing the indices of the row and column, respectively, of all the matrix elements, and `color_valmat` is the color representing the value of the matrix elements. It's a bit difficult to explain by words. Does it make sense to you? – norio Aug 05 '16 at 14:10
  • Since I have 500 rows and 28000 columns, how do I `plt.(row,column)` as it might say the dimensions are not the same. What do you mean by indices of the row and columns of the matrix? – Rangooski Aug 05 '16 at 14:15
  • 1
    I mean that we consider your matrix `A` as a data set of 140,000,000(=500*280000) elements. Each element has the attributes of row index, column index, and value. In other wors, we consider, conceptually (I'm not saying we do this in a python code), a data of `ds = {A[0,0], A[0,1], .., A[0,279999], A[1,0], A[1,1], .., A[1,279999], ..., A[499, 279999]}`. The k-th element of `ds` corresponds to `A[i,j]` with some `i` and `j`. Then, the 'row index' of `ds[k]` is `i`, the 'column index' of `ds[k]` is `j`, and the value of `ds[k]` is `A[i,j]`. The `0<=row_index[k]<500`, but `0<=k<140,000,000`. – norio Aug 05 '16 at 14:37
  • Can you please post your above comment as the answer for my example matrix `A`? I can very much understand your `ds`, but still not completely the row index and column index in plotting it. – Rangooski Aug 05 '16 at 14:46

4 Answers4

3

If you remove the LogNorm, you get black squares instead of white:

im = plt.matshow(A, cmap=plt.cm.hot, aspect='auto') # pl is pylab imported a pl

enter image description here


Edit

In a colormap you always have the complete grid filled with values. That's why you actually create the grid: You account for (say: interpolate) all the points that are not exactly in the grid. That means that your data has many zeroes and that the graph correctly reflects that by looking white (or black). By ignoring those values, you create a misleading graph, if you don't have a clear reason to do so.

If the values different than zero are the ones of interest to you, then you need another type of diagram, like pointed out by norio's comment. For that, you may want to have a look at this answer.


Edit 2

Adapted from this answer

You can treat the values as 1D arrays and plot the points independently, instead of filling a mesh with non-desired values.

A = [[0, 0, 0, 1, 0],
    [1, 0, 0, 2, 3],
    [5, 3, 0, 0, 0],
    [5, 0, 1, 0, 3],
    [6, 0, 0, 9, 0]]
A = np.array(A)
lenx, leny = A.shape

xx = np.array( [ a for a in range(lenx) for a in range(leny) ] )   # Convert 3D to 3*1D
yy = np.array( [ a for a in range(lenx) for b in range(leny) ] )
zz = np.array( [ A[x][y] for x,y in zip(xx,yy) ] )
#---
xx = xx[zz!=0]    # Drop zeroes
yy = yy[zz!=0]
zz = zz[zz!=0]
#---
zi, yi, xi = np.histogram2d(yy, xx, bins=(10,10), weights=zz, normed=False)
zi = np.ma.masked_equal(zi, 0)

fig, ax = plt.subplots()
ax.pcolormesh(xi, yi, zi, edgecolors='black')
scat = ax.scatter(xx, yy, c=zz, s=200)
fig.colorbar(scat)
ax.margins(0.05)

plt.show()

enter image description here

Community
  • 1
  • 1
Luis
  • 3,327
  • 6
  • 35
  • 62
  • 1
    The OP is asking to remove the contribution of zeros itself, here you still have it shown by black. – Srivatsan Aug 05 '16 at 13:49
  • @Luis. Yes I tried it before posting here. Still I am not able to visualise the the heatmap since my matrix really big. It shows a black figure with little red dots. – Rangooski Aug 05 '16 at 13:56
  • The thing is, by definition, in a color map you have a complete grid, where **all** points will be plotted. If this is not the case, you need another type of plot. See norio's comment above. – Luis Aug 05 '16 at 14:01
2

This answer is in the same direction as 'Edit 2' section of Luis' answer. In fact, this is a simplified version of it. I am posting this just in order to correct my misleading statements in my comments. I saw a warning that we should not discuss in the comment area, so I am using this answering area.

Anyway, first let me post my code. Please note that I used a larger matrix randomly generated inside the script, instead of your sample matrix A.

#!/usr/bin/python
#
# This script was written by norio 2016-8-5.

import os, re, sys, random
import numpy as np

#from matplotlib.patches import Ellipse
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.image as img

mpl.rcParams['lines.linewidth'] = 2
mpl.rcParams['lines.markeredgewidth'] = 1.0
mpl.rcParams['axes.formatter.limits'] = (-4,4)
#mpl.rcParams['axes.formatter.limits'] = (-2,2)
mpl.rcParams['axes.labelsize'] = 'large'
mpl.rcParams['xtick.labelsize'] = 'large'
mpl.rcParams['ytick.labelsize'] = 'large'
mpl.rcParams['xtick.direction'] = 'out'
mpl.rcParams['ytick.direction'] = 'out'


############################################
#numrow=500
#numcol=280000
numrow=50
numcol=28000
# .. for testing
numelm=numrow*numcol
eps=1.0e-9
#
#numnz=int(1.0e-7*numelm)
numnz=int(1.0e-5*numelm)
# .. for testing
vmin=1.0e-6
vmax=1.0
outfigname='stackoverflow38790536.png'
############################################

### data matrix
# I am generating a data matrix here artificially.
print 'generating pseudo-data..'
random.seed('20160805')
matA=np.zeros((numrow, numcol))
for je in range(numnz):
    jr = random.uniform(0,numrow)
    jc = random.uniform(0,numcol)
    matA[jr,jc] = random.uniform(vmin,vmax)


### Actual processing for a given data will start from here
print 'processing..'

idxrow=[]
idxcol=[]
val=[]
for ii in range(numrow):
    for jj in range(numcol):
        if np.abs(matA[ii,jj])>eps:
            idxrow.append(ii)
            idxcol.append(jj)
            val.append( np.abs(matA[ii,jj]) )

print 'len(idxrow)=', len(idxrow)    
print 'len(idxcol)=', len(idxcol)    
print 'len(val)=',    len(val)    


############################################
# canvas setting for line plots 
############################################

f_size   = (8,5)

a1_left   = 0.15
a1_bottom  = 0.15
a1_width  = 0.65
a1_height = 0.80
#
hspace=0.02
#
ac_left   = a1_left+a1_width+hspace
ac_bottom = a1_bottom
ac_width  = 0.03
ac_height = a1_height

############################################
# plot 
############################################
print 'plotting..'

fig1=plt.figure(figsize=f_size)
ax1 =plt.axes([a1_left, a1_bottom, a1_width, a1_height], axisbg='w')

pc1=plt.scatter(idxcol, idxrow, s=20, c=val, cmap=mpl.cm.gist_heat_r)
# cf.
# http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter
plt.xlabel('Column Index', fontsize=18)
plt.ylabel('Row Index', fontsize=18)
ax1.set_xlim([0, numcol-1])
ax1.set_ylim([0, numrow-1])

axc =plt.axes([ac_left, ac_bottom, ac_width, ac_height], axisbg='w')
mpl.colorbar.Colorbar(axc,pc1, ticks=np.arange(0.0, 1.5, 0.1) )

plt.savefig(outfigname)
plt.close()

This script output a figure, 'stackoverflow38790536.png', which will look like the following. scatter plot of non-zero elements

As you can see in my code, I used scatter instead of plot. I realized that the plot command is not best suitable for the task here.

Another of my words that I need to correct is that the row_index does not need to have as much as 140,000,000(=500*280000) elements. It only need to have the row indices of the non-zero elements. More correctly, the lists, idxrow, idxcol, and val, which enter into scatter command in the code above, has the lengths equal to the number of non-zero elements.

Please note that both of these points have been correctly taken care of in Luis' answer.

norio
  • 3,652
  • 3
  • 25
  • 33
1

You can set the zeroes to float("nan") and plot after that, works for me.

abe
  • 957
  • 5
  • 10
0

Although the answer of norio is correct. I think one can give a much more to the point quick answer with only a few lines of code:

import numpy as np
import matplotlib.pyplot as plt
A = np.asarray(A)
x,y = A.nonzero() #get the notzero indices
plt.scatter(x,y,c=A[x,y],s=100,cmap='hot',marker='s') #adjust the size to your needs
plt.colorbar()
plt.show()

enter image description here

Note that the axis are inverted. you could invert them by:

ax=plt.gca()
ax.invert_xaxis()
ax.invert_yaxis()

Also note that you have much more flexibility now:

  • You can set the marker-size and the marker-type and transparancy optionally
  • This procedure is faster, as the zeros are not parsed to matplotlib.
JLT
  • 712
  • 9
  • 15