0

I would like to plot a 2d histogram using matplotlib in order to visualize the influence of two variables on the occurrence of an event.

In my test case, the event is “wish coming true” and the variable x is the number of falling stars and y is the involvement of a fairy godmother. What I would like to do is to plot a 2d histogram of wishes coming true for bins of falling stars and fairy godmothers. Then next to each axis, I would like to show the probability of a wish coming true, event/(event+nonevent), for each bin of falling stars and fairy godmothers (1D bar chart containing probabilities for each histogram bin). The bar chart bins should correspond to and be aligned with the 2d histogram bins. However, there seems to be a slight misalignment between the bar charts and the histogram bins.

For aligning the bar chart correctly, will the settings of the limits of the axis corresponding to the first and last bin edges do the trick ? Once these limits are set, can I feed bin centers into plt.bar() as locations on the axis as opposed to indices ?

My code and the resulting images are as follows :

import numpy as np
import matplotlib.pyplot as plt
from numpy import linspace
import cubehelix

# Create random events and non-events
x_noneve = 3.*np.random.randn(10000) +22.
np.random.seed(seed=41)

y_noneve = np.random.randn(10000)
np.random.seed(seed=45)

x_eve = 3.*np.random.randn(1000) +22.
np.random.seed(seed=33)

y_eve = np.random.randn(1000)

x_all = np.concatenate((x_eve,x_noneve),axis=0)
y_all = np.concatenate((y_eve,y_noneve),axis=0)

# Set up default x and y limits
xlims = [min(x_all),max(x_all)]
ylims = [min(y_all),max(y_all)]

# Set up your x and y labels
xlabel = 'Falling Star'
ylabel = 'Fairy Godmother'

# Define the locations for the axes
left, width = 0.12, 0.55
bottom, height = 0.12, 0.55
bottom_h = left_h = left+width+0.03

# Set up the geometry of the three plots
rect_wishes = [left, bottom, width, height]  # dimensions of wish plot
rect_histx  = [left, bottom_h, width, 0.25]  # dimensions of x-histogram
rect_histy  = [left_h, bottom, 0.25, height] # dimensions of y-histogram

# Set up the size of the figure
fig = plt.figure(1, figsize=(9.5,9))
fig.suptitle('Wishes coming true', fontsize=18, fontweight='bold')

cx1 = cubehelix.cmap(startHue=240,endHue=-300,minSat=1,maxSat=2.5,minLight=.3,maxLight=.8,gamma=.9)

# Make the three plots
axWishes = plt.axes(rect_wishes) # wishes plot
axStarx = plt.axes(rect_histx)   # x bar chart  
axFairy = plt.axes(rect_histy)   # y bar chart 

# Define the number of bins
nxbins = 50
nybins = 50
nbins = 100

xbins = linspace(start = xlims[0], stop = xlims[1], num = nxbins)
ybins = linspace(start = ylims[0], stop = ylims[1], num = nybins)
xcenter = (xbins[0:-1]+xbins[1:])/2.0
ycenter = (ybins[0:-1]+ybins[1:])/2.0

delx    = np.around(xbins[1]-xbins[0], decimals=2,out=None)
dely    = np.around(ybins[1]-ybins[0], decimals=2,out=None)

H, xedges,yedges = np.histogram2d(y_eve,x_eve,bins=(ybins,xbins))
X = xcenter
Y = ycenter
H = np.where(H==0,np.nan,H) # Remove 0's from plot

# Plot the 2D histogram
cax = (axWishes.imshow(H, extent=[xlims[0],xlims[1],ylims[0],ylims[1]],
       interpolation='nearest', origin='lower',aspect="auto",cmap=cx1))

#Plot the axes labels
axWishes.set_xlabel(xlabel,fontsize=14)
axWishes.set_ylabel(ylabel,fontsize=14)

#Set up the plot limits
axWishes.set_xlim(xlims)
axWishes.set_ylim(ylims)

#Set up the probability bins
x_eve_hist, xoutbins    = np.histogram(x_eve, bins=xbins) 
y_eve_hist, youtbins    = np.histogram(y_eve, bins=ybins) 

x_noneve_hist, xoutbins    = np.histogram(x_noneve, bins=xbins) 
y_noneve_hist, youtbins    = np.histogram(y_noneve, bins=ybins) 

probax = [eve/(eve+noneve+0.0) if eve+noneve>0 else 0 for eve,noneve in zip(x_eve_hist,x_noneve_hist)]
probay = [eve/(eve+noneve+0.0) if eve+noneve>0 else 0 for eve,noneve in zip(y_eve_hist,y_noneve_hist)]

probax = probax/np.sum(probax)
probay = probay/np.sum(probay)

probax = np.round(probax*100., decimals=0, out=None)
probay = np.round(probay*100., decimals=0, out=None)

#Plot the bar charts  

#Set up the limits
axStarx.set_xlim( xlims[0], xlims[1])
axFairy.set_ylim( ylims[0], ylims[1])

axStarx.bar(xcenter, probax, align='center', width =delx, color = 'royalblue')
axFairy.barh(ycenter,probay,align='center', height=dely, color = 'mediumorchid')

#Show the plot
plt.show()

resulting image

hex version

  • Hi and welcome to SO. Do you have any *issues* with your code? It seems to work, you're asking for people's *opinions* on what is better and what isn't. This isn't necessarily what SO is about. Very nice plots, by the way ;) – Aleksander Lidtke Apr 08 '16 at 09:28
  • Thanks :) and sorry for any misuse of SO (as you can see I'm new). The main issues I had with this code were (i) getting the correct alignment of the bar chart bins - I wanted to make sure that setting the x y axis limits and then using centers would result in perfect alignment of the bar chart and the 2d histo and (ii) the use of imshow. In a much more complicated version of this code, I tried imshow with origin as 'upper' and 'lower' but it always looked inverted compared to the barcharts. Unfortunately, I can't post the more complicated version. – SpicyBaguette Apr 08 '16 at 09:51
  • Have you seen this: http://matplotlib.org/examples/pylab_examples/scatter_hist.html ? – Aleksander Lidtke Apr 08 '16 at 10:01
  • That might help with verifying the limits, thanks! – SpicyBaguette Apr 08 '16 at 10:11
  • No worries :) Good luck with that. – Aleksander Lidtke Apr 08 '16 at 10:22
  • So I checked my code against the scatter plot example and everything looks okay, I made two small changes on the limits : `extent=[xbins[0],xbins[-1],ybins[0],ybins[-1]]` and `axStarx.set_xlim(axWishes.get_xlim()) axFairy.set_ylim(axWishes.get_ylim())`. – SpicyBaguette Apr 08 '16 at 13:02
  • So does it work now? If so maybe edit your question to make it clear what the problem was, what its symptoms were etc. and post your solution as an answer to help people who might run into this issue in the future - that's the point of SO :) – Aleksander Lidtke Apr 08 '16 at 13:04

1 Answers1

0

While my original code was functional, the limits of the 2D histo and bar chart were not defined using the histogram bins. Thus any changes to the bins resulted in a poorly-aligned graph. To ensure that the limits of the graph always correspond to the limits of the histogram bins, I changed

cax = (axWishes.imshow(H, extent=[xmin,xmax,ymin,ymax],
       interpolation='nearest', origin='lower',aspect="auto",cmap=cx1))

to

cax = (axWishes.imshow(H, extent=[xbins[0],xbins[-1],ybins[0],ybins[-1]],
       interpolation='nearest', origin='lower',aspect="auto",cmap=cx1))

and

axStarx.set_xlim( xlims[0], xlims[1])
axFairy.set_ylim( ylims[0], ylims[1])

to

axStarx.set_xlim(axWishes.get_xlim()) 
axFairy.set_ylim(axWishes.get_ylim())

For information, bar chart can accept either indices or values along the axis as bar locations. When the bars correspond to bins and not categorical variables, it is important to set axis limits and correctly define bar width. These are done automatically with histo. However, if you wish to explore a variable other than the number of members by bin, you must use a bar chart and define the limits by hand.