There are two ways that I know of to color plot points by factor and then also have a corresponding legend automatically generated. I'll give examples of both:
- Using ggplot2 (generally easier)
- Using R's built in plotting functionality in combination with the
colorRampPallete
function (trickier, but many people prefer/need R's built-in plotting facilities)
For both examples, I will use the ggplot2 diamonds dataset. We'll be using the numeric columns diamond$carat
and diamond$price
, and the factor/categorical column diamond$color
. You can load the dataset with the following code if you have ggplot2 installed:
library(ggplot2)
data(diamonds)
Using ggplot2 and qplot
It's a one liner. Key item here is to give qplot
the factor you want to color by as the color
argument. qplot
will make a legend for you by default.
qplot(
x = carat,
y = price,
data = diamonds,
color = diamonds$color # color by factor color (I know, confusing)
)
Your output should look like this:

Using R's built in plot functionality
Using R's built in plot functionality to get a plot colored by a factor and an associated legend is a 4-step process, and it's a little more technical than using ggplot2.
First, we will make a colorRampPallete
function. colorRampPallete()
returns a new function that will generate a list of colors. In the snippet below, calling color_pallet_function(5)
would return a list of 5 colors on a scale from red to orange to blue:
color_pallete_function <- colorRampPalette(
colors = c("red", "orange", "blue"),
space = "Lab" # Option used when colors do not represent a quantitative scale
)
Second, we need to make a list of colors, with exactly one color per diamond color. This is the mapping we will use both to assign colors to individual plot points, and to create our legend.
num_colors <- nlevels(diamonds$color)
diamond_color_colors <- color_pallet_function(num_colors)
Third, we create our plot. This is done just like any other plot you've likely done, except we refer to the list of colors we made as our col
argument. As long as we always use this same list, our mapping between colors and diamond$colors
will be consistent across our R script.
plot(
x = diamonds$carat,
y = diamonds$price,
xlab = "Carat",
ylab = "Price",
pch = 20, # solid dots increase the readability of this data plot
col = diamond_color_colors[diamonds$color]
)
Fourth and finally, we add our legend so that someone reading our graph can clearly see the mapping between the plot point colors and the actual diamond colors.
legend(
x ="topleft",
legend = paste("Color", levels(diamonds$color)), # for readability of legend
col = diamond_color_colors,
pch = 19, # same as pch=20, just smaller
cex = .7 # scale the legend to look attractively sized
)
Your output should look like this:

Nifty, right?