2

I have a dataframe that contains 7 columns. The Regressor column has 3 different regressors (DT, DT-2, and DT-4).

I wanted to generate a correlation heatmap plot.

df_dt = df[(df["Regressor"]=="DT")]
df_dt_corr = df_dt.drop(["Regressor"], axis=1).corr()

df_dt2 = df[(df["Regressor"]=="DT-2")]
df_dt2_corr = df_dt2.drop(["Regressor"], axis=1).corr()

df_dt4 = df[(df["Regressor"]=="DT-4")]
df_dt4_corr = df_dt4.drop(["Regressor"], axis=1).corr()

#  SUBPLOTS
fig = plt.figure(figsize=(12,6))

plt.subplot(221)  
plt.title('Regressor: DT')
sns.heatmap(df_dt_corr, annot=True, fmt='.2f', square=True, cmap = 'Reds_r')

plt.subplot(222)  
plt.title('Regressor: DT-2')
sns.heatmap(df_dt2_corr, annot=True, fmt='.2f', square=True, cmap = 'Blues_r')

plt.subplot(223)
plt.title('Regressor: DT-4')
sns.heatmap(df_dt4_corr, annot=True, fmt='.2f', square=True, cmap = 'BuGn_r')

plt.show()

Now, the problem is, if I have 10 regressors, then I have to write 10 times repeated code for each regressor. Which is not a pythonic way or good programming practice.

Is there any way to do the same job in a pythonic way (i.e, using a loop, etc.)?

Please Note: In the demo dataframe I have 3 regressors but in my main dataframe I could have more regressors. So, I need a dynamic way to generate the plot based on the regressors.

Demo Data:

{'Regressor': {0: 'DT', 1: 'DT', 2: 'DT', 3: 'DT', 4: 'DT', 19: 'DT-2', 20: 'DT-2', 21: 'DT-2', 22: 'DT-2', 23: 'DT-2', 39: 'DT-4', 40: 'DT-4', 41: 'DT-4', 42: 'DT-4', 43: 'DT-4'}, 'Method': {0: 'method_1', 1: 'method_1', 2: 'method_1', 3: 'method_1', 4: 'method_1', 19: 'method_1', 20: 'method_1', 21: 'method_1', 22: 'method_1', 23: 'method_1', 39: 'method_1', 40: 'method_1', 41: 'method_1', 42: 'method_1', 43: 'method_1'}, 'CE': {0: 0.002874032327519, 1: 0.005745640214479, 2: 0.004661679592489, 3: 0.002846754581854, 4: 0.004576990206546, 19: 0.105364819313149, 20: 0.085976562255755, 21: 0.095881176731004, 22: 0.097398912201617, 23: 0.100491941499165, 39: 0.018162548523961, 40: 0.018954401200213, 41: 0.01788125083107, 42: 0.019784900032633, 43: 0.020438103824639}, 'MAE': {0: 0.737423646017325, 1: 2.00787732271062, 2: 2.86926125864208, 3: 3.32855382663718, 4: 3.77490323897613, 19: 13.345092685398, 20: 12.8063543324171, 21: 13.1292091661974, 22: 13.1451455897874, 23: 13.6537246486947, 39: 3.2667181947348, 40: 4.29467676417246, 41: 5.34081768096088, 42: 5.50421114390641, 43: 7.46988963588581}, 'MSqE': {0: 0.847829904338757, 1: 6.68342912741117, 2: 12.5560681493523, 3: 17.2772893168584, 4: 22.02275890951, 19: 232.978432669064, 20: 237.820275013751, 21: 244.5869111788, 22: 247.73962294989, 23: 266.451945948429, 39: 15.6880657226101, 40: 28.2245308508171, 41: 44.7562607712654, 42: 46.5234139459763, 43: 87.2324237935045}, 'R2': {0: 0.999729801060669, 1: 0.998038240639634, 2: 0.996528815654117, 3: 0.995203737109921, 4: 0.993477444422499, 19: 0.926657847114707, 20: 0.93726355821839, 21: 0.932221279553296, 22: 0.91924882453144, 23: 0.925514811021512, 39: 0.995151906119729, 40: 0.991723226976753, 41: 0.986284593333255, 42: 0.982615342502863, 43: 0.97292435121805}}
Opps_0
  • 408
  • 4
  • 19

3 Answers3

2

The answer that is already available is to use looping, but I looked around to see if I could use faceted grids to deal with this. Here is a great answer. I've modified it to fit your code. A single data frame is broken down into columns with a category variable to limit the number of columns. The map function draws a heat map with the split data. However, we could not find a way to set up a color map. I think the expansion with a single color map works well for analysis.

import pandas as pd
import seaborn as sns

data = {'Regressor': {0: 'DT', 1: 'DT', 2: 'DT', 3: 'DT', 4: 'DT', 19: 'DT-2', 20: 'DT-2', 21: 'DT-2', 22: 'DT-2', 23: 'DT-2', 39: 'DT-4', 40: 'DT-4', 41: 'DT-4', 42: 'DT-4', 43: 'DT-4'}, 'Method': {0: 'method_1', 1: 'method_1', 2: 'method_1', 3: 'method_1', 4: 'method_1', 19: 'method_1', 20: 'method_1', 21: 'method_1', 22: 'method_1', 23: 'method_1', 39: 'method_1', 40: 'method_1', 41: 'method_1', 42: 'method_1', 43: 'method_1'}, 'CE': {0: 0.002874032327519, 1: 0.005745640214479, 2: 0.004661679592489, 3: 0.002846754581854, 4: 0.004576990206546, 19: 0.105364819313149, 20: 0.085976562255755, 21: 0.095881176731004, 22: 0.097398912201617, 23: 0.100491941499165, 39: 0.018162548523961, 40: 0.018954401200213, 41: 0.01788125083107, 42: 0.019784900032633, 43: 0.020438103824639}, 'MAE': {0: 0.737423646017325, 1: 2.00787732271062, 2: 2.86926125864208, 3: 3.32855382663718, 4: 3.77490323897613, 19: 13.345092685398, 20: 12.8063543324171, 21: 13.1292091661974, 22: 13.1451455897874, 23: 13.6537246486947, 39: 3.2667181947348, 40: 4.29467676417246, 41: 5.34081768096088, 42: 5.50421114390641, 43: 7.46988963588581}, 'MSqE': {0: 0.847829904338757, 1: 6.68342912741117, 2: 12.5560681493523, 3: 17.2772893168584, 4: 22.02275890951, 19: 232.978432669064, 20: 237.820275013751, 21: 244.5869111788, 22: 247.73962294989, 23: 266.451945948429, 39: 15.6880657226101, 40: 28.2245308508171, 41: 44.7562607712654, 42: 46.5234139459763, 43: 87.2324237935045}, 'R2': {0: 0.999729801060669, 1: 0.998038240639634, 2: 0.996528815654117, 3: 0.995203737109921, 4: 0.993477444422499, 19: 0.926657847114707, 20: 0.93726355821839, 21: 0.932221279553296, 22: 0.91924882453144, 23: 0.925514811021512, 39: 0.995151906119729, 40: 0.991723226976753, 41: 0.986284593333255, 42: 0.982615342502863, 43: 0.97292435121805}}

df_dt_corr = pd.DataFrame(data)

g = sns.FacetGrid(df_dt_corr, col="Regressor", col_wrap=2)
g.map_dataframe(lambda data, color:sns.heatmap(data.corr(), annot=True, fmt='.2f', square=True))

enter image description here

r-beginners
  • 31,170
  • 3
  • 14
  • 32
0

This is simply just the case put putting everything inside a loop. First off, the program finds the regressors it should use by getting all the unique values in df['Regressors'].values.

axes is automatically decided based on how many regressors there are. It will try to make a square.

The possible colormaps are defined as colors, change this list if you want different colors. The program starts with the first color, then the second, and so on. If there are too few colors, it will loop back to the start.

regressors = set(df['Regressor'].values)
fig = plt.figure(figsize=(12,6))

import math
axes = (math.ceil(math.sqrt(len(regressors))),) * 2

colors = [
            'Greys', 'Purples', 'Blues', 'Greens', 'Oranges', 'Reds',
            'YlOrBr', 'YlOrRd', 'OrRd', 'PuRd', 'RdPu', 'BuPu',
            'GnBu', 'PuBu', 'YlGnBu', 'PuBuGn', 'BuGn', 'YlGn']

for index, regressor in enumerate(regressors):
    df_dt = df[(df['Regressor']==regressor)]
    df_dt_corr = df_dt.drop(["Regressor"], axis=1).corr()

    plt.subplot(*axes, index + 1)
    plt.title('Regressor: ' + regressor)
    sns.heatmap(df_dt_corr, annot=True, fmt='.2f', square=True, cmap=colors[index%len(colors)])
plt.show()

I changed the way you use plt.subplot, as the method you were using only supports up to 9 plots, and it was easier to automatically modify the axes this way.

sommervold
  • 301
  • 2
  • 8
  • Thanks for the ans. But, the code is generating 2 figures. One is empty and one is the correlation. – Opps_0 Jul 19 '21 at 19:45
  • Moreover, if I want to add a title, where I have add `plt.title`? I am using before `plt.show` but the title is appearing only for the 3rd correlation plot not at the centre or for the whole plot. – Opps_0 Jul 19 '21 at 19:49
0

Select the unique values first

I stored the unique values in the Regressor column to vals variable. Then I used it to loop around each value. See the solution below:

# get the unique values in "Regressor" column
vals=df['Regressor'].unique()

plt.figure(figsize=[10,10],dpi=200)
plt.suptitle("Correlation Map") # Super Title
# start the loop for selecting data and plotting
for idx, value in enumerate(vals):
    #get the dataframe for the unique value and drop the unwanted column using the "iloc"
    data=df[df['Regressor']==value].iloc[:,2:] # 2: selects the thrid column onwards
    # plot the correlation map
    plt.subplot(len(vals),2,idx+1)
    plt.title(f"Regressor={value}")
    sns.heatmap(data.corr(), annot=True, fmt='.2f', square=True) here

All you have to select here is the number of columns in the columns in the subplots and the supertitle.

Result

enter image description here

ArunRaj131
  • 21
  • 4