13

I think this is a simple question, but I just still can't seem to think of a simple solution. I have a set of data of molecular abundances, with values ranging many orders of magnitude. I want to represent these abundances with boxplots (box-and-whiskers plots), and I want the boxes to be calculated on log scale because of the wide range of values. I know I can just calculate the log10 of the data and send it to matplotlib's boxplot, but this does not retain the logarithmic scale in plots later.

So my question is basically this: When I have calculated a boxplot based on the log10 of my values, how do I convert the plot afterward to be shown on a logarithmic scale instead of linear with the log10 values? I can change tick labels to partly fix this, but I have no clue how I get logarithmic scales back to the plot.

Or is there another more direct way to plotting this. A different package maybe that has this options already included?

Many thanks for the help.

Frank
  • 619
  • 1
  • 6
  • 26
  • Why not convert your log10 calculated values back to normal values (`10**y`) and set the y-scale to be logarithmic? –  Jan 05 '16 at 10:03
  • 1
    Maybe I should clarify that I create the plot like this: `bp = ax.boxplot(np.log10(abunds))`. This command calculate the box values and creates the plot. I will need to change things in the plot, not the values, right? – Frank Jan 05 '16 at 10:10
  • 2
    The way you're doing it, you are plotting different things. I still don't understand why you can't do `bp = ax.boxplot(abunds); ax.set_yscale('log')`. That will give you a log-scale, and thus the y-ticks properly correspond to your values. –  Jan 05 '16 at 12:22
  • Because the log-values are negative (values are 10^(-4) and lower), so I get an error with `ax.set_yscale('log')` – Frank Jan 05 '16 at 12:42
  • 2
    Tobias your *log* values are negative, but your original `abunds` values should not be. Are you sure you did exactly what @Evert suggested? – Andras Deak -- Слава Україні Jan 05 '16 at 12:55
  • 1
    Sorry my mistake, I mistook and thought he suggested `ax.boxplot(np.log10(abunds))`. However, I don't think it will in this case calculate the box plots based on a logarithmic scale. There is too much spread in the plots and causing a lot of outliers – Frank Jan 05 '16 at 13:09

2 Answers2

11

I'd advice against doing the boxplot on the raw values and setting the y-axis to logarithmic, because the boxplot function is not designed to work across orders of magnitudes and you may get too many outliers (depends on your data, of course).

Instead, you can plot the logarithm of the data and manually adjust the y-labels.

Here is a very crude example:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, FormatStrFormatter

np.random.seed(42)

values = 10 ** np.random.uniform(-3, 3, size=100)

fig = plt.figure(figsize=(9, 3))


ax = plt.subplot(1, 3, 1)

ax.boxplot(np.log10(values))
ax.set_yticks(np.arange(-3, 4))
ax.set_yticklabels(10.0**np.arange(-3, 4))
ax.set_title('log')

ax = plt.subplot(1, 3, 2)

ax.boxplot(values)
ax.set_yscale('log')
ax.set_title('raw')

ax = plt.subplot(1, 3, 3)

ax.boxplot(values, whis=[5, 95])
ax.set_yscale('log')
ax.set_title('5%')

plt.show()

results

The right figure shows the box plot on the raw values. This leads to many outliers, because the maximum whisker length is computed as a multiple (default: 1.5) of the interquartile range (the box height), which does not scale across orders of magnitude.

Alternatively, you could specify to draw the whiskers for a given percentile range: ax.boxplot(values, whis=[5, 95]) In this case you get a fixed amount of outlires (5%) above and below.

MB-F
  • 22,770
  • 4
  • 61
  • 116
  • Thank you for the nice example. Is there a way to add also minor ticks for the log plot as they are in the raw plot? – Frank Jan 05 '16 at 13:25
  • I don't know, sorry. Maybe it's possible with `matplotlib.ticker`: http://matplotlib.org/examples/pylab_examples/major_minor_demo1.html – MB-F Jan 05 '16 at 13:35
  • I could set minor ticks following a similar logic of the major ticks. For example, to set minor ticks at positions `1, 2, ..., 9, 20, 30, ..., 90`, compute their log10 and set as minor ticks: `minor_xticks = np.log10(np.concatenate((np.arange(1, 10), np.arange(1, 10) * 10)).astype(np.float))` `ax.set_xticks(minor_xticks, minor=True)` – fabiocapsouza Sep 01 '20 at 04:22
3

You can use plt.yscale:

plt.boxplot(data); plt.yscale('log')
Ferro
  • 1,863
  • 2
  • 14
  • 20