1

I am using Pandas 0.18. I have a dataframe like this:

code    proportion    percent_highcost    total_quantity
A81     0.7           76                  1002
A81     0.0           73                  1400

And I am drawing a scatter plot like this:

colours = np.where(df['proportion'] > 0, 'r', 'b')  
df.plot.scatter(y='percent_highcost', x='total_quantity', c=colours)

This works well, but I don't know how to add a legend to indicate what the two colours mean.

I've tried plt.legend(['Non-dispensing', 'dispensing'], loc=1) but this produces an odd result - I guess because there's only one series:

enter image description here

Can anyone advise?

Richard
  • 62,943
  • 126
  • 334
  • 542
  • I suggest using `fig.colorbar(sc)` where `sc` is the artist `scatter`. You may have to use `ax.scatter` rather than `df.plot` to easily get access to that artist. – tacaswell May 05 '16 at 12:18

1 Answers1

0

Plot unique DataFrames on the same axis

Plotting multiple series (not pandas Series) in a scatter can be accomplished by separating the DataFrames by a condition and then plotting them as separate scatters with unique colors on the same axis. This was shown in this answer. I will reproduce it here with your data.

Note: this was done in an iPython/Jupyter notebook

%matplotlib inline

import pandas as pd
from cStringIO import StringIO  

# example data
text = '''
code    proportion    percent_highcost    total_quantity
A81     0.7           76                  1002
A81     0.0           73                  1400
A81     0.1           77                  1300
A81     0.0           74                  1200
A81     -0.1          78                  1350
'''

# read in example data
df = pd.read_csv(StringIO(text), sep='\s+')

print 'Original DataFrame:'
print df
print

# split the DataFrame into two DataFrames
condition = df['proportion'] > 0
df1 = df[condition].dropna()
df2 = df[~condition].dropna()

print 'DataFrame 1:'
print df1
print

print 'DataFrame 2:'
print df2
print

# Plot 2 DataFrames on one axis
ax = df1.plot(kind='scatter', x='total_quantity', y='percent_highcost', c='b', s=100, label='Non-Dispensing')
df2.plot(kind='scatter', x='total_quantity', y='percent_highcost', c='r', s=100, label='Dispensing', ax=ax)

Original DataFrame:
  code  proportion  percent_highcost  total_quantity
0  A81         0.7                76            1002
1  A81         0.0                73            1400
2  A81         0.1                77            1300
3  A81         0.0                74            1200
4  A81        -0.1                78            1350

DataFrame 1:
  code  proportion  percent_highcost  total_quantity
0  A81         0.7                76            1002
2  A81         0.1                77            1300

DataFrame 2:
  code  proportion  percent_highcost  total_quantity
1  A81         0.0                73            1400
3  A81         0.0                74            1200
4  A81        -0.1                78            1350

Two Series Scatter Plot

Community
  • 1
  • 1
tmthydvnprt
  • 10,398
  • 8
  • 52
  • 72