5

In my application I'm transitioning from R to native Python (scipy + matplotlib) where possible, and one of the biggest tasks was converting from a R heatmap to a matplotlib heatmap. This post guided me with the porting. While most of it was painless, I'm still not convinced on the colormap.

Before showing code, an explanation: in the R code I defined "breaks", i.e. a fixed number of points starting from the lowest value up to 10, and ideally centered on the median value of the data. Its equivalent here would be with numpy.linspace:

# Matrix is a DataFrame object from pandas
import numpy as np

data_min = min(matrix.min(skipna=True))
data_max = max(matrix.max(skipna=True))
median_value = np.median(matrix.median(skipna=True))

range_min = np.linspace(0, median_value, 50)
range_max = np.linspace(median_value, data_max, 50)
breaks = np.concatenate((range_min, range_max))

This gives us 100 points that will be used for coloring. However, I'm not sure on how to do the exact same thing in Python. Currently I have:

def red_black_green():
    cdict = {
       'red': ((0.0, 0.0, 0.0),
               (0.5, 0.0, 0.0),
               (1.0, 1.0, 1.0)),
       'blue': ((0.0, 0.0, 0.0),
                (1.0, 0.0, 0.0)),
       'green': ((0.0, 0.0, 1.0),
                 (0.5, 0.0, 0.0),
                 (1.0, 0.0, 0.0))
       }

    my_cmap = mpl.colors.LinearSegmentedColormap(
        'my_colormap', cdict, 100)

    return my_cmap

And further down I do:

# Note: vmin and vmax are the maximum and the minimum of the data

# Adjust the max and min to scale these colors
if vmin > 0:
    norm = mpl.colors.Normalize(vmin=0, vmax=vmax / 1.08)
else:
    norm = mpl.colors.Normalize(vmin / 2, vmax / 2)

The numbers are totally empirical, that's why I want to change this into something more robust. How can I normalize my color map basing on the median, or do I need normalization at all?

Einar
  • 4,727
  • 7
  • 49
  • 64

1 Answers1

4

By default, matplotlib will normalise the colormap such that the maximum colormap value will be the maximum of your data. Likewise for the minimum of your data. This means that the median of the colormap (the middle value) will line up with the interpolated median of your data (interpolated if you don't have a data point exactly at the median).

Here's an example:

from numpy.random import rand
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

cdict = {'red':   ((0.0, 0.0, 0.0),
                   (0.5, 0.0, 0.0),
                   (1.0, 1.0, 1.0)),
         'blue':  ((0.0, 0.0, 0.0),
                   (1.0, 0.0, 0.0)),
         'green': ((0.0, 0.0, 1.0),
                   (0.5, 0.0, 0.0),
                   (1.0, 0.0, 0.0))}

cmap = mcolors.LinearSegmentedColormap(
'my_colormap', cdict, 100)

ax = plt.subplot(111)
im = ax.imshow(2*rand(20, 20) + 1.5, cmap=cmap)
plt.colorbar(im)
plt.show()

Notice the middle of the colour bar takes value 2.5. This is the median of the data range: (min + max) / 2 = (1.5+3.5) / 2 = 2.5.

Hope this helps.

dmcdougall
  • 2,456
  • 1
  • 17
  • 15
  • 1
    (min + max)/2 is not actually the median, but rather the [mid-range](http://en.wikipedia.org/wiki/Mid-range) – Mr. Squig Oct 17 '12 at 13:36
  • Fair point. I think the definition used in the field of statistics is the value in the middle of your data array after it has been sorted. Thanks for pointing that out, and apologies if I caused any confusion. – dmcdougall Oct 17 '12 at 16:27
  • On my system this produces colors range from red through black to green. https://imgur.com/a/OWZhjfP I'd like the map to range smoothly from red though blue to green. Any pointers? – John Apr 26 '18 at 18:13