How to cross hatch data gaps in a heatmap

Question

I seek some help to plot a heat map which should look like this:

The data set that one can use along x axis is an array of years from 1975 to 2018 [1975,.....2018]

For y axis: An array of month [January to December]

For x-y intersection values, as shown in image, one can use 1 or 2 or 3

In the image I added, cross signs represent data gaps and white spaces represent zero(0) values.

Update:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df = pd.read_csv('Events_in_Month_and_Year.xlsx',encoding = 'unicode_escape',error_bad_lines=False
                )
pivoted = df.pivot_table(index='month', columns='year', aggfunc=len, fill_value=0)
pivoted = pivoted.loc[months]  # change the order of the rows to be the same as months
for _ in range(20):
    # set some random locations to "not filled in"
    pivoted.iloc[np.random.randint(0, len(pivoted)), np.random.randint(0, len(pivoted.columns))] = np.nan
max_val = np.nanmax(pivoted.to_numpy())
ax = sns.heatmap(pivoted, cmap=plt.get_cmap('Greys', max_val + 1), vmin=-0.5, vmax=max_val + 0.5)
ax.patch.set_facecolor('white')
ax.patch.set_edgecolor('black')  # will be used for hatching
ax.patch.set_hatch('xxxx')
spines = ax.collections[0].colorbar.ax.spines
for s in spines:
    spines[s].set_visible(True) # show border around colorbar
plt.tight_layout()
plt.show()

I have tried this code. But getting error

Error tokenizing data. C error: Buffer overflow caught - possible malformed input file

My data is stored in a .xlsx file which looks like this

Hi, thanks for your answer. Can you tell how the data should be saved in .txt or .xlsx or .doc so that to be fetched in the code. — Prater, Sep 28 '21 at 09:26
You need to do "save as" as a "csv" file. Then you can do `pivoted = pd.read_csv('Your_file.csv')`. As your csv already has the counts, there is no need to call `df.pivot_table()` nor to reorder the columns. Note that the file you posted doesn't seem to make a difference between `0` and `data gaps`. — JohanC, Sep 29 '21 at 12:22
Thankyou for your answer. The data data gaps are generally for the years which are not in the data file i.e. 1976, 1977 ,1989 etc.. The zero in the data represent no event in that month for the respective year. So I decided to represent it by a blank space i.e., white color. — Prater, Sep 30 '21 at 05:00

JohanC · Accepted Answer · 2021-09-30T13:42:45.893

You can use sns.heatmap to create a heatmap. You can hatch the background via ax.patch.set_hatch('xx') (more xs means a tighter hatch pattern). See the gallery for more hatch options.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df = pd.DataFrame({'month': np.random.choice(months, 1000), 'year': np.random.randint(1975, 2019, 1000)})
pivoted = df.pivot_table(index='month', columns='year', aggfunc=len, fill_value=0)
pivoted = pivoted.loc[months]  # change the order of the rows to be the same as months
for _ in range(20):
    # set some random locations to "not filled in"
    pivoted.iloc[np.random.randint(0, len(pivoted)), np.random.randint(0, len(pivoted.columns))] = np.nan
max_val = np.nanmax(pivoted.to_numpy())
ax = sns.heatmap(pivoted, cmap=plt.get_cmap('Greys', max_val + 1), vmin=-0.5, vmax=max_val + 0.5)
ax.patch.set_facecolor('white')
ax.patch.set_edgecolor('black')  # will be used for hatching
ax.patch.set_hatch('xxxx')
ax.collections[0].colorbar.outline.set_linewidth(1) # make outline visible
plt.tight_layout()
plt.show()

aPS: If you have your original data e.g. in Excel, you can save them as a csv file and load them with df = pd.read_csv(filename).

The code for a file similar to the one in the post, could look like the following. To make a difference between 0 and a "data gap", missing data could be represented in the Excel file with an empty cell.

Empty rows for missing years can be added via assigning a new index.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# read the dataframe from a .csv file
pivoted = pd.read_csv('test.csv', index_col=0) # maybe: delimiter=';'
# extend the index to include all intermediate years
pivoted = pd.DataFrame(pivoted, index=range(pivoted.index.min(), pivoted.index.max() + 1))
# exchange columns and rows
pivoted = pivoted.T 
max_val = np.nanmax(pivoted.to_numpy())
ax = sns.heatmap(pivoted, cmap=plt.get_cmap('Greys', max_val + 1), vmin=-0.5, vmax=max_val + 0.5,
                 cbar_kws={'ticks': np.arange(max_val+1)})
ax.patch.set_facecolor('white')
ax.patch.set_edgecolor('black')  # will be used for hatching
ax.patch.set_hatch('xxxx')
ax.collections[0].colorbar.outline.set_linewidth(1) # make outline visible
ax.collections[0].colorbar.outline.set_edgecolor('black')
plt.tight_layout()
plt.show()

How to cross hatch data gaps in a heatmap

1 Answers1