1

This is a very basic question but there must be something I'm missing. My data looks like this:

    x   y   z
0   2.7 0.3 a
1   3.4 0.4 b
2   15  1.9 b
3   3   0.4 c
4   7.4 0.8 a

Where z has n qualitative values. I would like to plot (x,y) using z as a label (i.e. n different colours, etc.). The way I do it now is essentially restricting to the individual values of z, looping over them and do one scatterplot at a time. Is there a quicker option?

EDIT: this is my current solution

for i, z in zip(range(4), ["a", "b", "c", "d"]):
    df.xs(z).plot(kind="scatter", label=z, x="x", y="y", color=colours[i], ax=ax)

where colours and ax are defined elsewhere. The reasons why I dislike this solution are

  1. Why do I have to put colours manually, I already have a palette and normal plots already loop through it.
  2. Why should I care about ax, Pandas should take care of everything.
  3. (most important!) I don't want to loop through either ["a", "b", "c", "d"] or set(df.z).
marco
  • 806
  • 1
  • 7
  • 17
  • I think this answers your question : http://stackoverflow.com/questions/15910019/annotate-data-points-while-plotting-from-pandas-dataframe/15911372#15911372 – euri10 Nov 28 '14 at 07:57
  • 1
    I edited my question to explain where I am already. I am just surprised that Pandas doesn't provide such basic functionality. – marco Nov 28 '14 at 10:29

1 Answers1

0
import pandas as pd

df = pd.DataFrame(data=[2.7, 3.4, 15, 3, 7.4], columns=['x'])
df['y'] = [ 0.3, 0.4, 1.9, 0.4, 0.8]
df['z'] = ['a', 'b', 'b', 'c', 'a']


ax = df.set_index('x')['y'].plot(style='o')

def label_point(x, y, val, ax):
    a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
    for i, point in a.iterrows():
        ax.text(point['x'], point['y'], str(point['val']))

label_point(df.x, df.y, df.z, ax)

draw()
euri10
  • 2,446
  • 3
  • 24
  • 46