4

I am relatively new at creating maps in R using ggplot2, but I have been struggling for a few days with this issue. I have created my plot, but can't seem to increase the number of bins used to map colors to my values.

This is an issue, as the map doesn't do a good job showing the variation in the data. I'm not sure if I'm approaching this problem correctly.

Here is my code:

region=c('alaska','alabama','arkansas','arizona','california','colorado','connecticut','florida','georgia','hawaii','iowa','idaho','illinois','indiana','kansas','kentucky','louisiana','massachusetts','maryland','maine','michigan','minnesota','missouri','mississippi','montana','north carolina','north dakota','nebraska','new hampshire','new jersey','new mexico','nevada','new york','ohio','oregon','pennsylvania','south carolina','south dakota','tennessee','texas','utah','virginia','vermont','washington','wisconsin','west virginia','oklahoma','wyoming')

sales=c(46,1240,471,2292,13427,1574,261,10036,826,1508,184,939,2356,1329,434,271,714,208,2027,21,950,500,1871,147,249,1204,69,175,369,1968,606,656,2369,2422,525,2902,1709,126,1563,12046,931,2271,46,2260,250,122,0,0)

state_data = as.data.frame(cbind(region,sales))

library(ggplot2)
library(maps)
all_states <- map_data("state")

D = merge(all_states, state_data, by = "region")
D = D[with(D,order(D$group,D$order)),] 

p = ggplot()
p = p + geom_polygon( data=D, aes(x=long, y=lat, group = group, fill=D$sales),colour="white" )
p = p + xlab("")
p = p + ylab("")
p = p + labs(title = "sales")
p = p + guides(color=FALSE) 
p = p + guides(size=FALSE) 
p = p + guides(fill=guide_legend() )
p = p + guides(fill= guide_colorbar(title="sales",barheight = 1,barwidth=15,direction="horizontal",nbin=8) )
p = p + theme(legend.position="bottom")
p

Ideally, I'd like to increase the number of bins on the legend to about 8-10, and possibly add another color to the gradient to show additional detail. I've experimented with ggplot2 functions, but am not having much luck.

pb2q
  • 58,613
  • 19
  • 146
  • 147
Jonathan
  • 781
  • 8
  • 20

1 Answers1

5

The problem you have is related to the fact that your sales column is a factor, not a numeric. The following will remedy that:

D = merge(all_states, state_data, by = "region")
D = D[with(D,order(D$group,D$order)),] 
D$sales= as.numeric(D$sales) # this is the important bit...

p = ggplot(data=D) + 
     geom_polygon(aes(x=long, y=lat, 
                      group = group, fill=sales), 
                  colour = "white" ) + # Do not use D$, by use the column name
     xlab("") + ylab("") + labs(title = "sales") + theme(legend.position="bottom")
p

enter image description here

...or with a two color scale:

p + scale_fill_gradient2(midpoint = 20)

enter image description here

Some style notes:

  • Do not use vectors (D$sales) as an aesthetic, use just the column name (sales).
  • I dislike the constant p = p + ... style, just use + at the end of a line and go to the next line.
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • Hey Paul, thanks a lot for your response. I think the data showing up as a factor is an error in the code I posted. I got similar results. The range of the sales data is [0,14000], but the image you posted shows it only going to 40 or 50. I was able to produce a similar map to the first one you posted, however when the scale is increased up to 14000, the fill gradient is reduced to only 3 buckets: [0,5000,10000,10000+]. – Jonathan Oct 15 '12 at 20:28
  • You need to make your example more realistic then. And in addition, I recently posted a question which answer might interest you http://stackoverflow.com/questions/12834802/non-linear-color-distribution-over-the-range-of-values-in-a-geom-raster. – Paul Hiemstra Oct 15 '12 at 20:46