0

I'm doing a log-log plot with Seaborn; the data is actually derived from a StackOverflow developer survey. I tried using the built-in log scale, but the results didn't make sense, so this simply calculates the logs before plotting.

df = pd.DataFrame( {'company_size_range': {7800: 7.0, 7801: 700.0, 7802: 7.0, 7803: 20000.0, 7805: 200.0, 7806: 20000.0, 7808: 2000.0, 7809: 2000.0, 7810: 7.0, 7811: 200.0, 7812: 50.0, 7813: 20000.0, 7816: 2.0, 7819: 200.0, 7820: 2000.0, 7824: 2.0, 7825: 2.0, 7827: 2.0, 7828: 50.0, 7830: 14.0, 7831: 50.0, 7833: 200.0, 7834: 50.0, 7835: 50.0, 7838: 2.0, 7840: 50.0, 7841: 50.0, 7842: 7000.0, 7843: 20000.0, 7844: 14.0, 7846: 2.0, 7850: 20000.0, 7851: 700.0, 7852: 200.0, 7853: 200.0, 7855: 200.0, 7856: 7.0, 7857: 50.0, 7858: 700.0, 7861: 20000.0, 7863: 20000.0, 7865: 20000.0, 7867: 700.0, 7868: 20000.0, 7870: 50.0, 7871: 2000.0, 7872: 50.0, 7873: 20000.0, 7874: 200.0, 7876: 14.0, 7877: 20000.0, 7879: 50.0, 7880: 50.0 }, 'team_size_range': {7800: 7.0, 7801: 7.0, 7802: 7.0, 7803: 2.0, 7805: 7.0, 7806: 2.0, 7808: 7.0, 7809: 7.0, 7810: 2.0, 7811: 17.0, 7812: 7.0, 7813: 2.0, 7816: 2.0, 7819: 7.0, 7820: 30.0, 7824: 2.0, 7825: 2.0, 7827: 2.0, 7828: 2.0, 7830: 2.0, 7831: 7.0, 7833: 2.0, 7834: 2.0, 7835: 7.0, 7838: 2.0, 7840: 7.0, 7841: 30.0, 7842: 7.0, 7843: 7.0, 7844: 2.0, 7846: 2.0, 7850: 7.0, 7851: 11.0, 7852: 7.0, 7853: 7.0, 7855: 2.0, 7856: 7.0, 7857: 7.0, 7858: 11.0, 7861: 7.0, 7863: 2.0, 7865: 30.0, 7867: 7.0, 7868: 7.0, 7870: 2.0, 7871: 17.0, 7872: 7.0, 7873: 17.0, 7874: 7.0, 7876: 2.0, 7877: 7.0, 7879: 17.0, 7880: 7.0}} )
g=sns.jointplot(x=np.log10(df['company_size_range']+1), 
                y=np.log10(df['team_size_range']+1), kind='kde', color='g')

That's fine, but the axes show the log values, not the underlying values. The X-axis, for example, is:

-1, 1, 2, 3, 4, 5, 6

So I added this to fix it, using the X position of the labels as the X values:

g.ax_joint.set_xticklabels(["{:.0f}".format(10**label.get_position()[0]-1) 
                            for label in g.ax_joint.get_xticklabels()])

The trouble is the resulting X-axis labels are nonsense:

1, 2, 3, 5, 9, 0, 0, 0

What is going on, and how best to fix it, please?

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
CharlesW
  • 955
  • 8
  • 18

2 Answers2

1

You could make use of a FuncFormatter. The benefit would be that the ticks are always drawn right also after resizing the window.

import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import numpy as np
import pandas as pd
import seaborn as sns

def tickformat_pow10(value, tick_number):
    return f'{10**value:,.0f}'

# df = ...
g = sns.jointplot(x=np.log10(df['company_size_range'] + 1),
                  y=np.log10(df['team_size_range'] + 1), kind='kde', color='g')

g.ax_joint.xaxis.set_major_formatter(FuncFormatter(tickformat_pow10))
g.ax_joint.yaxis.set_major_formatter(FuncFormatter(tickformat_pow10))

example plot

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • Yo! Elegant! Thank you very much. @JohanC – CharlesW May 13 '20 at 15:49
  • And correcting the oddity that the Y axis should be one less: g.ax_joint.xaxis.set_major_formatter(FuncFormatter(lambda value, n: f'{10**value:,.0f}' )) g.ax_joint.yaxis.set_major_formatter(FuncFormatter(lambda value, n: f'{10**value-1:,.0f}' )) – CharlesW May 13 '20 at 15:55
0

Try the following by first using the canvas.draw(). Also, I do not understand why you are subtracting 1

g.fig.canvas.draw()

g.ax_joint.set_xticklabels(["{:.0f}".format(10**label.get_position()[0]-1) 
                            for label in g.ax_joint.get_xticklabels()]);

enter image description here

Sheldore
  • 37,862
  • 7
  • 57
  • 71
  • Thank you! I take it calling canvas.draw() gets it to precalculate the labels, so they can be changed... The subtract 1 is standard stats - I added one before taking the log. But it doesn't look right on the axis, as you point out – CharlesW May 13 '20 at 15:29