2

I have been provided with some customer data in Latitude, Longitude, and Counts format. All the data I need to create a ggplot heatmap is present, but I do not know how to put it into the format ggplot requires.

I am trying to aggregate the data by total counts within 0.01 Lat and 0.01 Lon blocks (typical heatmap), and I instinctively thought "tapply". This creates a nice summary by block size, as desired, but the format is wrong. Furthermore, I would really like to have empty Lat or Lon block values be included as zeroes, even if there is nothing there... otherwise the heatmap ends up looking streaky and odd.

I have created a subset of my data for your reference in the code below:

# m is the matrix of data provided
m = matrix(c(44.9591051,44.984884,44.984884,44.9811399,
           44.9969096,44.990894,44.9797023,44.983334,
          -93.3120017,-93.297668,-93.297668,-93.2993524,
          -93.2924484,-93.282462,-93.2738911,-93.26667,
          69,147,137,22,68,198,35,138), nrow=8, ncol=3) 
colnames(m) <- c("Lat", "Lon", "Count")
m <- as.data.frame(m)
s = as.data.frame((tapply(m$Count, list(round(m$Lon,2), round(m$Lat,2)), sum)))
s[is.na(s)] <- 0

# Data frame "s" has all the data, but not exactly in the format desired...
# First, it has a column for each latitude, instead of one column for Lon
# and one for Lat, and second, it needs to have 0 as the entry data for 
# Lat / Lon pairs that have no other data. As it is, there are only zeroes
# when one of the other entries has a Lat or Lon that matches... if there
# are no entries for a particular Lat or Lon value, then nothing at all is
# reported.

desired.format = matrix(c(44.96,44.96,44.96,44.96,44.96,
    44.97,44.97,44.97,44.97,44.97,44.98,44.98,44.98,
    44.98,44.98,44.99,44.99,44.99,44.99,44.99,45,45,
    45,45,45,-93.31,-93.3,-93.29,-93.28,-93.27,-93.31,
    -93.3,-93.29,-93.28,-93.27,-93.31,-93.3,-93.29,
    -93.28,-93.27,-93.31,-93.3,-93.29,-93.28,-93.27,
    -93.31,-93.3,-93.29,-93.28,-93.27,69,0,0,0,0,0,0,
    0,0,0,0,306,0,0,173,0,0,0,198,0,0,0,68,0,0),
    nrow=25, ncol=3)

colnames(desired.format) <- c("Lat", "Lon", "Count")
desired.format <- as.data.frame(desired.format)

minneapolis = get_map(location = "minneapolis, mn", zoom = 12)
ggmap(minneapolis) + geom_tile(data = desired.format, aes(x = Lon, y = Lat, alpha = Count), fill="red")
halfer
  • 19,824
  • 17
  • 99
  • 186
rucker
  • 393
  • 3
  • 13
  • Have you looked into more "native" solutions, like geom_hex and stat_density2d? – ako Jul 07 '14 at 01:21
  • Ako... no I have not. I am relatively new to this particular sort of visualization, and appreciate any new directions for investigation. I will look into these two. Thank you. – rucker Jul 07 '14 at 14:12
  • Ako... I just looked into geom_hex, and it seems that it is looking for the same format. The root problem is that ggplot seems to want an individual line item for each row, not count values per row. Making things worse is the fact that these count values need to be aggregated by spatial location to give the heatmap (or hexmap) desired. – rucker Jul 07 '14 at 14:40

1 Answers1

3

Here is a stab with geom_hex and stat_density2d. The idea of making bins by truncating coordinates makes me a bit uneasy.

What you have is count data, with lat/longs given, which means ideally you would need a weight parameter, but that is as far as I know not implemented with geom_hex. Instead, we hack it by repeating rows per the count variable, similar to the approach here.

  ## hack job to repeat records to full count
  m<-as.data.frame(m)
  m_long <- with(m, m[rep(1:nrow(m), Count),])


  ## stat_density2d
  ggplot(m_long, aes(Lat, Lon)) + 
  stat_density2d(aes(alpha=..level.., fill=..level..), size=2, 
                 bins=10, geom=c("polygon","contour")) + 
  scale_fill_gradient(low = "blue", high = "red") +
  geom_density2d(colour="black", bins=10) +
  geom_point(data = m_long)


  ## geom_hex alternative
  bins=6
  ggplot(m_long, aes(Lat, Lon)) + 
  geom_hex(bins=bins)+
  coord_equal(ratio = 1/1)+
  scale_fill_gradient(low = "blue", high = "red") +
  geom_point(data = m_long,position = "jitter")+
  stat_binhex(aes(label=..count..,size=..count..*.5), size=3.5,geom="text", bins=bins, colour="white")

These, respectively, produce the following: enter image description here And the binned version: enter image description here

EDIT:

With basemap:

map + 
  stat_density2d(data = m_long, aes(x = Lon, y = Lat,
alpha=..level.., fill=..level..), 
                 size=2, 
                 bins=10, 
                 geom=c("polygon","contour"),
                 inherit.aes=FALSE) + 
  scale_fill_gradient(low = "blue", high = "red") +
  geom_density2d(data = m_long, aes(x = Lon, y=Lat),
                 colour="black", bins=10,inherit.aes=FALSE) +
  geom_point(data = m_long, aes(x = Lon, y=Lat),inherit.aes=FALSE)


## and the hexbin map...

map + #ggplot(m_long, aes(Lat, Lon)) + 
  geom_hex(bins=bins,data = m_long, aes(x = Lon, y = Lat),alpha=.5,
                 inherit.aes=FALSE) + 
  geom_point(data = m_long, aes(x = Lon, y=Lat),
             inherit.aes=FALSE,position = "jitter")+
  scale_fill_gradient(low = "blue", high = "red")

enter image description here enter image description here

Community
  • 1
  • 1
ako
  • 3,569
  • 4
  • 27
  • 38
  • Ako, thank you! Those are exceptionally cool, and are even better than what I was trying to do! Even better, it handles the addition of the counts just fine natively! However, for my application, I really need the map underneath to guide interpretation. I tried to "drop in" the geom_hex code into my existing ggmap code, but it bombed out. Is it possible to put a map underneath either of these, and change the alpha (transparency) to allow visibility through the graph? – rucker Jul 07 '14 at 23:32
  • Now I can... My first answer (to a completely different question) was accepted, so now I can vote it up, accept and close. :-) Thanks so much. – rucker Jul 08 '14 at 12:17