1

I seek some help to plot a heat map which should look like this:

enter image description here

The data set that one can use along x axis is an array of years from 1975 to 2018 [1975,.....2018]

For y axis: An array of month [January to December]

For x-y intersection values, as shown in image, one can use 1 or 2 or 3

In the image I added, cross signs represent data gaps and white spaces represent zero(0) values.

Update:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df = pd.read_csv('Events_in_Month_and_Year.xlsx',encoding = 'unicode_escape',error_bad_lines=False
                )
pivoted = df.pivot_table(index='month', columns='year', aggfunc=len, fill_value=0)
pivoted = pivoted.loc[months]  # change the order of the rows to be the same as months
for _ in range(20):
    # set some random locations to "not filled in"
    pivoted.iloc[np.random.randint(0, len(pivoted)), np.random.randint(0, len(pivoted.columns))] = np.nan
max_val = np.nanmax(pivoted.to_numpy())
ax = sns.heatmap(pivoted, cmap=plt.get_cmap('Greys', max_val + 1), vmin=-0.5, vmax=max_val + 0.5)
ax.patch.set_facecolor('white')
ax.patch.set_edgecolor('black')  # will be used for hatching
ax.patch.set_hatch('xxxx')
spines = ax.collections[0].colorbar.ax.spines
for s in spines:
    spines[s].set_visible(True) # show border around colorbar
plt.tight_layout()
plt.show()

I have tried this code. But getting error

Error tokenizing data. C error: Buffer overflow caught - possible malformed input file

My data is stored in a .xlsx file which looks like this enter image description here

JohanC
  • 71,591
  • 8
  • 33
  • 66
Prater
  • 107
  • 1
  • 8
  • https://seaborn.pydata.org/generated/seaborn.heatmap.html – Corralien Sep 28 '21 at 09:16
  • Hi, thanks for your answer. Can you tell how the data should be saved in .txt or .xlsx or .doc so that to be fetched in the code. – Prater Sep 28 '21 at 09:26
  • Yes I have my data in excel file. – Prater Sep 29 '21 at 09:49
  • You need to do "save as" as a "csv" file. Then you can do `pivoted = pd.read_csv('Your_file.csv')`. As your csv already has the counts, there is no need to call `df.pivot_table()` nor to reorder the columns. Note that the file you posted doesn't seem to make a difference between `0` and `data gaps`. – JohanC Sep 29 '21 at 12:22
  • Thankyou for your answer. The data data gaps are generally for the years which are not in the data file i.e. 1976, 1977 ,1989 etc.. The zero in the data represent no event in that month for the respective year. So I decided to represent it by a blank space i.e., white color. – Prater Sep 30 '21 at 05:00
  • @JohanC I hope I am clearer about my query now. – Prater Sep 30 '21 at 11:35

1 Answers1

4

You can use sns.heatmap to create a heatmap. You can hatch the background via ax.patch.set_hatch('xx') (more xs means a tighter hatch pattern). See the gallery for more hatch options.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df = pd.DataFrame({'month': np.random.choice(months, 1000), 'year': np.random.randint(1975, 2019, 1000)})
pivoted = df.pivot_table(index='month', columns='year', aggfunc=len, fill_value=0)
pivoted = pivoted.loc[months]  # change the order of the rows to be the same as months
for _ in range(20):
    # set some random locations to "not filled in"
    pivoted.iloc[np.random.randint(0, len(pivoted)), np.random.randint(0, len(pivoted.columns))] = np.nan
max_val = np.nanmax(pivoted.to_numpy())
ax = sns.heatmap(pivoted, cmap=plt.get_cmap('Greys', max_val + 1), vmin=-0.5, vmax=max_val + 0.5)
ax.patch.set_facecolor('white')
ax.patch.set_edgecolor('black')  # will be used for hatching
ax.patch.set_hatch('xxxx')
ax.collections[0].colorbar.outline.set_linewidth(1) # make outline visible
plt.tight_layout()
plt.show()

sns.heatmap with hatched background aPS: If you have your original data e.g. in Excel, you can save them as a csv file and load them with df = pd.read_csv(filename).

The code for a file similar to the one in the post, could look like the following. To make a difference between 0 and a "data gap", missing data could be represented in the Excel file with an empty cell.

Empty rows for missing years can be added via assigning a new index.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# read the dataframe from a .csv file
pivoted = pd.read_csv('test.csv', index_col=0) # maybe: delimiter=';'
# extend the index to include all intermediate years
pivoted = pd.DataFrame(pivoted, index=range(pivoted.index.min(), pivoted.index.max() + 1))
# exchange columns and rows
pivoted = pivoted.T 
max_val = np.nanmax(pivoted.to_numpy())
ax = sns.heatmap(pivoted, cmap=plt.get_cmap('Greys', max_val + 1), vmin=-0.5, vmax=max_val + 0.5,
                 cbar_kws={'ticks': np.arange(max_val+1)})
ax.patch.set_facecolor('white')
ax.patch.set_edgecolor('black')  # will be used for hatching
ax.patch.set_hatch('xxxx')
ax.collections[0].colorbar.outline.set_linewidth(1) # make outline visible
ax.collections[0].colorbar.outline.set_edgecolor('black')
plt.tight_layout()
plt.show()
JohanC
  • 71,591
  • 8
  • 33
  • 66