2

So, I found this code about how to map a variable to hexagon size a while ago and tried to modify it so I could use it for my purpose of drawing basketball shot charts. I know that there have been some other threads like this one, but none I've read in the past answered my question. The first one does help but I'm stuck with one little problem:

Let's say I have a data frame with 4 variables, x, y, value (of points the shot has; in basketball it's either 2 or 3, depending on how far from the basket you take the shot), and outcome (1 for shot made, 0 for shot missed), on 250 observations. 250 shots with x and y coordinates, value and outcome.

Example:

     x         y     value outcome
1 169.7650 -316.5726     3   0
2  75.0775 -182.3126     2   0
3  94.0150 -147.4050     2   1
4 109.1650 -138.0068     2   0
5  87.7025 -146.0624     2   1

# dput below:

structure(list(x = c(169.765, 75.0775, 94.015, 109.165, 87.7025), 
y = c(-316.5726, -182.3126, -147.405, -138.0068, -146.0624), 
value = c(3L, 2L, 2L, 2L, 2L), outcome = c(0L, 0L, 1L, 0L, 1L)), 
.Names = c("x", "y", "value", "outcome"), class = "data.frame", row.names = c(NA, -5L))

Negative coordinates because (0/0) is in the top left corner. With the code from the first thread I linked above I was able to bin my data, I just can't figure out how to operate on the binned data. This is what I got so far:

Imgur-Link

From this code:

# devtools::install_git("https://github.com/hadley/densityvis.git")

library(densityvis)

bin = hex_bin(df$x, df$y, var4=df$value, frequency.to.area=TRUE)
hexes = hex_coord_df(x=bin$x, y=bin$y, 
                     width=attr(bin,"width"), height=attr(bin,"height"),
                     size=bin$size)
hexes$rightness = rep(bin$col, each=6)

ggplot(hexes, aes(x=x, y=y)) + geom_polygon(aes(fill=rightness, group=id))

With the size displaying how many shots were TAKEN from the given area. Color gives the value of the shots from that area. What I want is something like points per shot, meaning: summing up the points per bin and then dividing by the number of shots taken, ranging from 0 (no shots made) to 3 (all shots made from a 3 point area) and displaying only bins with at least two shots TAKEN.

I know it is a lot to ask, and it's my problem that I can't do it on my own. But if anyone had the time, any help would be much appreciated.

Edit: I uploaded the csv sample that created the above image here. I don't know if it's cool to post 300 lines of code into a question, that's why I link geotheory's code here. My slightly modified example is in the code bracket above, I just ran

df <- read.csv("sample_data.csv", header=TRUE)

beforehand.

Community
  • 1
  • 1
John Paper
  • 83
  • 7
  • Wouldn't it be great if you could make a small pie chart for every coordinate? Size would correspond to the number of shots taken and pieces of pie would correspond to proportion of 2/3 points per location. – Roman Luštrik Oct 13 '14 at 11:57
  • That's an interesting thought, thank you! Though I think it might be too much, because you'd have to study each bin seperetaly to get an idea of how player X shoots from that spot Y, whereas hexagons colored by efficiency (points per shot) give you a better total overview on from where that player feels most comfortable taking his shots, no? – John Paper Oct 13 '14 at 12:10
  • @roman_luštrik check out the `ggsubplot` package – geotheory Oct 13 '14 at 12:16
  • @geotheory uff, that package is a lot of work. :) – Roman Luštrik Oct 13 '14 at 12:25
  • It might look it but I've actually found it pretty straightforward to e.g. overlay graphs on maps. Just look for a minimal example. – geotheory Oct 13 '14 at 12:30

1 Answers1

3

As the hex_bin code stands the zero value observations are filtered out. This can be changed by removing the & var4 > 0 argument from clean_xy (line 117 in github). Then the following:

df$pts = 0
for(i in 1:nrow(df)) if(df$outcome[i] == 1) df$pts[i] = df$value[i]
bin = hex_bin(df$x, df$y, var4=df$pts, frequency.to.area=TRUE)
hexes = hex_coord_df(x=bin$x, y=bin$y, width=attr(bin,"width"), height=attr(bin,"height"), size=bin$size)
hexes$points = rep(bin$col, each=6)
ggplot(hexes, aes(x=x, y=y)) + geom_polygon(aes(fill=points, group=id))

gives you:

enter image description here

Is that what you're looking for?

geotheory
  • 22,624
  • 29
  • 119
  • 196
  • It is, thank you so much! :) Could you tell me where I'd have to look when I wanted to filter out all bins with only one/two/three observations in them? – John Paper Oct 13 '14 at 15:36
  • The data is binned into the `binned` object in `hex_bin`. You'd just need to add `binned <- binned[binned$freq > 3,]` after to filter out those rows. You may need to re-specify the plot coordinates if this filters out data that defines the extent of the ball court. – geotheory Oct 13 '14 at 15:43
  • All it does for me right now is [this](http://i.imgur.com/gmENRfH.png) with `binned <- binned[binned$freq > 2,]` where I was hoping the bins would stay where they were before and keep their size, just minus the smaller ones. But I'll figure it out. Thanks again, I really do appreciate your help! – John Paper Oct 13 '14 at 16:02
  • I think that is what you've asked it to do. Try using `coord_cartesian` with `xlim` and `ylim` arguments to specify the original plot limits. – geotheory Oct 13 '14 at 16:15