1

I am currently using seaborn.heatmap() to display binary data that I have organized in a pandas.DataFrame. The index of the DataFrame is discrete and corresponds to different locations, while the columns are continuous and represent time. How can I make the x Axis in the heatmap to have a correct spacing between the measurement values?

To be more precise, I want the difference between 0 and 1'000 to be 1'000 times bigger than between 0 and 1 and 10'000 times the difference between 1 and 1.1. Here is a minimal of how my data is organised:

import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df=pd.DataFrame(np.random.randint(0,2,size=(5, 8)), columns=[1,1.1,2,3,4,1001,1002,1003], index=['A','B','C','D','E'])
sns.heatmap(df,cmap='binary', square=True)

The resulting image looks like this: https://i.stack.imgur.com/uxSrH.png

The data between the measurements (eg. for measurement value 500, which is not part of the DataFrame should be 0. I do not mind giving up the square=True.

For those of you wondering, the 0/1 are False/True statements that indicate whether or not I made a measurement at this sampling site at this location at a given time.

Thank you so much

nevs
  • 15
  • 2
  • I am not sure I follow. You have binary data, so you cannot make a "heatmap". You will only have two colors no matter what, if your data are only 0's and 1's. And since the x-axis is also discrete, you will not get the spacing you are looking for, and I don't think that is what you want. – Karl Oct 18 '21 at 12:22
  • Thanks for your reply. It should only be heatmap-like and show where I have a measurement – nevs Oct 18 '21 at 12:46

1 Answers1

0

You could use plt.pcolor(), which creates an unevenly-spaced grid, with the gridlines provided by its first and second parameter. As a 5x8 grid of cells needs 6x9 grid lines, both the list of x-values and of y-values needs to be extended by one.

The example uses 101 instead of 1001, because a factor of 1000 difference would make everything pulled together to a thin line, except the area between 4 and 1001.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# slightly modified example data
df = pd.DataFrame(np.random.randint(0, 2, size=(5, 8)), columns=[1, 1.1, 2, 3, 4, 101, 102, 103],
                  index=['A', 'B', 'C', 'D', 'E'])
plt.pcolor(list(df.columns) + [2 * df.columns[-1] - df.columns[-2]],
           np.arange(len(df.index)+1),
           df.values, cmap='binary')
plt.yticks(np.arange(0.5, len(df.index)), df.index) # labels between the grid lines
plt.show()

plt.pcolor to create a grid

JohanC
  • 71,591
  • 8
  • 33
  • 66