1

I have data comprising of 3 columns:

zone | pop1 | pop2
----   ----   ----
3      4500   3800
2      2800   3100
1      1350   1600
2      2100   1900
3      3450   3600

I would like to draw a scatter plot of pop1 and pop2, with the circles having colors based on the value of zone.

I have the following code so far:

df = pd.read_csv(file_path)
plt.scatter(df['pop1'],df['pop2'], s = 1)

How can I give different colors, let's say red, green and blue, corresponding to zone values 1, 2 and 3 respectively?

SaadH
  • 1,158
  • 2
  • 23
  • 38

2 Answers2

1

You can use seaborn package, which use matplotlib wrapper. It has varieties of features with beautiful plots. Here is simple example to your question.

import matplotlib.pyplot as plt
%matplotlib inline 
import seaborn as sns
import pandas as pd

data = pd.DataFrame({'col1':[4500,2800,1350,2100,3450],
             'col2':[3800,3100 ,1650,1900,3600],
             'col3':[3,2,1,2,3]})

sns.lmplot(data=data, x='col1', y='col2', hue='col3', 
                   fit_reg=False, legend=True)
#fit_reg is use to fit a line for regression, we need only dots.

enter image description here

Ankish Bansal
  • 1,827
  • 3
  • 15
  • 25
  • Can you control the colors here? Maybe its not important for the OP, but the colors corresponding to a particular value are specified in the question. – krm Jan 09 '19 at 05:33
1

Without using an additional library, you can also go for something like:

colors = {1:'red', 2:'green', 3:'blue'}

for i in range(len(df)):
    plt.scatter(df['pop1'].iloc[i], df['pop2'].iloc[i],
                c=colors[df['zone'].iloc[i]])

EDIT: You dont need to use a loop, you can use something like this:

colors = {1:'red', 2:'green', 3:'blue'}

plt.scatter(df['pop1'], df['pop2'], 
            c=[colors[i] for i in df['zone']])

Which gives the output:

enter image description here

This requires you to make a dictionary of colors for the values in zones though. Also you will spend some extra time making the list comprehension.

krm
  • 847
  • 8
  • 13
  • For large datasets, this approach would be taking a lot of time. – SaadH Jan 09 '19 at 06:11
  • You are correct. I have update the answer to avoid the the loop. There is still a list comprehension that needs to be done, but I expect that to be much faster than repeated calls to `plt.scatter` – krm Jan 09 '19 at 06:37
  • Yes, much faster now. I have selected it as the accepted solution. – SaadH Jan 11 '19 at 00:56