4

I'm trying to make a function which produces attractive-looking scatter plots. I have two conflicting desires:

  • Individual, separate data points are visible;
  • Multiple data points that are close together, such that their dots overlap, should darken.

I'm currently accomplishing the through usage of the alpha channel. The former, I accomplish by including a copy of the scatter plot which does not have an alpha channel:

import numpy as np
import matplotlib.pyplot as plt

N = 100000

fig = plt.figure()
fig.set_facecolor("white")

x = np.random.randn(N)
y = np.random.randn(N)

base_colour = (0.25, 0.4, 0.6)
adj_colour = tuple(0.75 + 0.25*x for x in base_colour)
plt.scatter(x, y, color=adj_colour, linewidth=0)
plt.scatter(x, y, color=base_colour, alpha=0.05, linewidth=0)

This produces a picture such as the following (depending on your RNG):

Example scatter plot

Note how the "outliers" are individually visible, but the centre is darker than the outer edges, implying that data points are more densely distributed there.

Note also, however, that most of the central area is all the same shade of blue: the alpha is high enough that many different overlapping points together all have an alpha of approximately 1. (In fact, it's so close to 1 that each pixel in the middle is the exact same shade of blue.) Due to the way "overlapping alpha channels" work, the amount of "white" in each pixel is exponentially decaying in the density of the points.

I could use a lower alpha. However, that wouldn't look as nice for graphs which have significantly fewer data points, or areas that are less densely populated. Is there any way around this, or am I going to have to make the user of my function type in an alpha value that works nice for them?

Otherwise, is there a way to accomplish what I'm doing without making two scatter plots in the same figure?

acdr
  • 4,538
  • 2
  • 19
  • 45
  • how many data points do you have? I have an idea that you could fit a KDE to the data, and then evaluate the KDE at the location of each datapoint, scaling alpha according to some function of the KDE density there. That way you can have low alpha in high density regions, and high alpha in low density regions. – Angus Williams Nov 16 '16 at 16:18
  • You could use a [Slider](http://matplotlib.org/examples/widgets/slider_demo.html) to make the user interactively change the alpha. You could also plot a histogram from your data simply like [this](http://matplotlib.org/examples/pylab_examples/hist2d_log_demo.html) or any of [those](http://stackoverflow.com/questions/27156381/python-creating-a-2d-histogram-from-a-numpy-matrix). – ImportanceOfBeingErnest Nov 16 '16 at 17:29
  • @Angus: I have hundreds of thousands of data points, so calculating a KDE for each of them isn't really an option. I tried just calculating a 2-d histogram, so with square bins, but that makes the graph look block-y. – acdr Nov 17 '16 at 08:52
  • 1
    You could try `hexbin`? Or you could make a 2D histogram and then use bicubic interpolation with `imshow` to make the plot less 'block-y'? – Angus Williams Nov 17 '16 at 08:57

0 Answers0