50

I would like to annotate the data points with their values next to the points on the plot. The examples I found only deal with x and y as vectors. However, I would like to do this for a pandas DataFrame that contains multiple columns.

ax = plt.figure().add_subplot(1, 1, 1)
df.plot(ax = ax)
plt.show()

What is the best way to annotate all the points for a multi-column DataFrame?

4 Answers4

59

Here's a (very) slightly slicker version of Dan Allan's answer:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string

df = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)}, 
                  index=list(string.ascii_lowercase[:10]))

Which gives:

          x         y
a  0.541974  0.042185
b  0.036188  0.775425
c  0.950099  0.888305
d  0.739367  0.638368
e  0.739910  0.596037
f  0.974529  0.111819
g  0.640637  0.161805
h  0.554600  0.172221
i  0.718941  0.192932
j  0.447242  0.172469

And then:

fig, ax = plt.subplots()
df.plot('x', 'y', kind='scatter', ax=ax)

for k, v in df.iterrows():
    ax.annotate(k, v)

Finally, if you're in interactive mode you might need to refresh the plot:

fig.canvas.draw()

Which produces: Boring scatter plot

Or, since that looks incredibly ugly, you can beautify things a bit pretty easily:

from matplotlib import cm
cmap = cm.get_cmap('Spectral')
df.plot('x', 'y', kind='scatter', ax=ax, s=120, linewidth=0, 
        c=range(len(df)), colormap=cmap)

for k, v in df.iterrows():
    ax.annotate(k, v,
                xytext=(10,-5), textcoords='offset points',
                family='sans-serif', fontsize=18, color='darkslategrey')

Which looks a lot nicer: Nice scatter plot

Community
  • 1
  • 1
LondonRob
  • 73,083
  • 37
  • 144
  • 201
43

Do you want to use one of the other columns as the text of the annotation? This is something I did recently.

Starting with some example data

In [1]: df
Out[1]: 
           x         y val
 0 -1.015235  0.840049   a
 1 -0.427016  0.880745   b
 2  0.744470 -0.401485   c
 3  1.334952 -0.708141   d
 4  0.127634 -1.335107   e

Plot the points. I plot y against x, in this example.

ax = df.set_index('x')['y'].plot(style='o')

Write a function that loops over x, y, and the value to annotate beside the point.

def label_point(x, y, val, ax):
    a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
    for i, point in a.iterrows():
        ax.text(point['x'], point['y'], str(point['val']))

label_point(df.x, df.y, df.val, ax)

draw()

Annotated Points

p_mcp
  • 2,643
  • 8
  • 36
  • 75
Dan Allan
  • 34,073
  • 6
  • 70
  • 63
34

Let's assume your df has multiple columns, and three of which are x, y, and lbl. To annotate your (x,y) scatter plot with lbl, simply:

ax = df.plot(kind='scatter',x='x',y='y')
df[['x','y','lbl']].apply(lambda row: ax.text(*row),axis=1);
tozCSS
  • 5,487
  • 2
  • 34
  • 31
12

I found the previous answers quite helpful, especially LondonRob's example that improved the layout a bit.

The only thing that bothered me is that I don't like pulling data out of DataFrames to then loop over them. Seems a waste of the DataFrame.

Here was an alternative that avoids the loop using .apply(), and includes the nicer-looking annotations (I thought the color scale was a bit overkill and couldn't get the colorbar to go away):

ax = df.plot('x', 'y', kind='scatter', s=50 )

def annotate_df(row):  
    ax.annotate(row.name, row.values,
                xytext=(10,-5), 
                textcoords='offset points',
                size=18, 
                color='darkslategrey')
    
_ = df.apply(annotate_df, axis=1)

enter image description here

Edit Notes

I edited my code example recently. Originally it used the same:

fig, ax = plt.subplots()

as the other posts to expose the axes, however this is unnecessary and makes the:

import matplotlib.pyplot as plt

line also unnecessary.

Also note:

  • If you are trying to reproduce this example and your plots don't have the points in the same place as any of ours, it may be because the DataFrame was using random values. It probably would have been less confusing if we'd used a fixed data table or a random seed.
  • Depending on the points, you may have to play with the xytext values to get better placements.
Community
  • 1
  • 1
Alnilam
  • 3,121
  • 2
  • 21
  • 22