1

I have a pandas dataframe which has around 2000 rows and it looks like this:

    qba_alias_prob   y_score
0             0.3016  0.391449
1             0.8735  0.463365
2             0.5530  0.405257
3             0.5984  0.706649
4             0.7980  0.636735
...              ...       ...
1736          0.9902  0.749777
1737          0.4439  0.704889
1738          0.9694  0.814441
1739          0.9694  0.988001
1740          0.9358  0.781842

There are two columns which contains a probability score between 0 to 1. Now i want to plot a heatmap between these two columns ( any of the two columns can be on any axis ).I saw several solutions for doing this but they are plotting the heatmap between the index value and column value of a dataframe but not between two columns.

I tried this code:

import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
import seaborn as sns

df= pd.read_csv('/Users/meenumee/Downloads/heatmap_toys.csv')

sns.heatmap(df, annot=True)


I got this output.

enter image description here

But i want the heatmap to be as qba_alias_prob on one axis lets say X axis and y_score to be on Y axis. And also i want a bigger heatmap where i can get more cleaner view of the data.

As enke suggested

sns.heatmap(df.set_index('qba_alias_prob'), annot=True)

I am getting this: enter image description here

But how can i make this a little visualisable? On the Y axis, the probability are in random order. Also this is very messy.

  • You have defined what you want as X and Y, but what do you want as Z/color ? – Guimoute Mar 13 '22 at 14:51
  • 1
    Use [histogram2d](https://numpy.org/doc/stable/reference/generated/numpy.histogram2d.html) to get the density/heatmap, then you can plot it with `sns.heatmap` or `plt.imshow`. – Quang Hoang Mar 13 '22 at 14:54
  • @Guimoute there is no such priority but the colors should be able to give good view of the distribution. – Meenu Meena Mar 13 '22 at 14:56
  • @enke yeah. I mean I am not able to get some visualisation from this also? though it did put them in two different axis. – Meenu Meena Mar 13 '22 at 15:02
  • 1
    Are you sure the probabilities are numeric? Why aren't they sorted when you plot? –  Mar 13 '22 at 15:13
  • @enke yes they are numeric. – Meenu Meena Mar 13 '22 at 15:18
  • What do you expect to achieve by annotating 2000 unique lines in a graph? I assume what you want is what Quang Hoang suggested. – Mr. T Mar 13 '22 at 15:48
  • Hey @Mr.T i want to see if there are some patterns in the data. – Meenu Meena Mar 13 '22 at 15:55
  • This is what Quang Hoang suggested - bin data to see in which bins more values are accumulated. So far, you just have colored your pandas dataframe. – Mr. T Mar 13 '22 at 16:06

0 Answers0