2

I've got a data like below:

data_ex <- data.frame(x = runif(1000, 0, 10),
                      y = runif(1000, 0, 10),
                      z = runif(1000, 0, 1))

So basically these are points (x, y) within a square (but could be also a rectangle) with some value z. I want to divide this plane into 100 smaller squares (rectangles) and average z value within them. So I did the following:

data_ex <- data_ex %>% 
  mutate(x2 = cut(x, breaks = 0:10), 
         y2 = cut(y, breaks = 0:10)) %>%
  group_by(x2, y2) %>%
  mutate(z = mean(z)) %>% 
  ungroup()

Now I want to plot it and use averaged z value as colour of each small square (rectangle). Potentially, I could use geom_tile to do it (like shown below), but it needs centers of the tiles as input.

data_ex %>% 
  ggplot() +
  geom_rect(aes(xmin = 0, xmax = 10, ymin = 0, ymax = 10), fill = 'white') +
  geom_tile(aes(x_center, y_center, fill = z))

I could probably extract it as a centers of x2 and y2, but it seems a little cumbersome. Therefore, I wonder if there's quicker way to perform appropriate calculations or make desired plot in a different way.

jakes
  • 1,964
  • 3
  • 18
  • 50

3 Answers3

2

You can make use of the floor and ceiling functions to create arbitrary rectangle sizes, then calculate the midpoint of those intervals. I've modified your second code block a little:

data_ex <- data_ex %>% 
  mutate(x2 = cut(x, breaks = 0:10), 
         y2 = cut(y, breaks = 0:10)) %>%
  group_by(x2, y2) %>%
  mutate(mean_z = mean(z),
         x_mid = floor(x) + (ceiling(x) - floor(x))/2,
         y_mid = floor(y) + (ceiling(y) - floor(y))/2,
         height = ceiling(y) - floor(y),
         width = ceiling(x) - floor(x)) %>%
  ungroup()

Then plot and specify height and width parameters to aes() for the geom_tile():

data_ex %>% 
  ggplot() +
  geom_rect(aes(xmin = 0, xmax = 10, ymin = 0, ymax = 10), fill = 'white') +
  geom_tile(aes(x = x_mid, y = y_mid,height = height, width = width, fill = mean_z))

This will also work for non-square rectangles, seen if you apply cut to x and y with different breaks.

data_ex <- data_ex %>% 
      mutate(x2 = cut(x, breaks = 0:10), 
             y2 = cut(y, breaks = c(0,2,4,6,8))) %>%
      group_by(x2, y2) %>%
      mutate(mean_z = mean(z),
             x_mid = floor(x) + (ceiling(x) - floor(x))/2,
             y_mid = floor(y) + (ceiling(y) - floor(y))/2,
             height = ceiling(y) - floor(y),
             width = ceiling(x) - floor(x)) %>%
      ungroup()

data_ex %>% 
      ggplot() +
      geom_rect(aes(xmin = 0, xmax = 10, ymin = 0, ymax = 10), fill = 'white') +
      geom_tile(aes(x = x_mid, y = y_mid,height = height, width = width, fill = mean_z))
johnckane
  • 645
  • 8
  • 18
  • Did you check the output? Here's what I get: https://imgur.com/a/98EgkxI The small squares doesn't cover the big square in total and are slightly moved towards bottom left corner. This is because `geom_tile` requires `x`, `y` of tile's centres. – jakes Feb 26 '19 at 20:22
  • Yes, I see that now. I've updated the code to calculate the midpoint of the intervals. – johnckane Feb 26 '19 at 21:10
  • Thanks. Second issue is that it seems to plot every single point, only coloured by the tile affiliation, which can be time-consuming with large datasets (see https://imgur.com/a/oOKrWvl). I'd prefer the solution that summarises values and therefore limits the number of observations to plot. – jakes Feb 27 '19 at 06:02
  • Looks like your data is actual spatial data from a soccer field with sparsity in some locations, rather than more uniformly complete data across the plotting domain like in the example you've provided. Have you attempted to use different values of `breaks` in the `cut` call in `mutate`? That would make the rectangles not the same size, but would allow wider coverage and could eliminate the white spaces in your graph. Try it out in the third code block from my answer. – johnckane Feb 27 '19 at 17:18
2

Edit: OP requested way to make the binning work for any arbitrary scale and binning size.

The binning could be made flexible with a function:

library(tidyverse)
bin_df <- function(df, x_binwidth, y_binwidth) {
  df %>%
    mutate(x2 = x_binwidth * (floor(x/x_binwidth) + 0.5), 
           y2 = y_binwidth * (floor(y/y_binwidth) + 0.5)) %>%
    group_by(x2, y2) %>%
    summarize(z = mean(z)) %>% 
    ungroup()
}

data_ex %>%
  bin_df(x_binwidth = 1, y_binwidth = 1) %>%
  ggplot() +
  geom_tile(aes(x2, y2, fill = z)) +
  scale_x_continuous(breaks = 0:10)

enter image description here

data_ex %>%
  bin_df(x_binwidth = 2, y_binwidth = 2) %>%
  ggplot() +
  geom_tile(aes(x2, y2, fill = z)) +
  scale_x_continuous(breaks = 0:10)

enter image description here

Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • For the fake data, there are no values of 10 (or at least a very low chance of any arising); if your real data has values at the edges you might want to add a special case between the `mutate` and the `group_by` that assigns the value of exactly 10 to the 9.5 bin, e.g. `mutate(x2 = pmin(9.5, x2), y2 = pmin(9.5, y2))`, or use an `if_else` to do that. – Jon Spring Feb 27 '19 at 17:17
  • This is nice, but `floor` does the trick only with integers and 0-10 scale. If the scale is different and you want to have a grid of arbitrary size this approach fails. – jakes Feb 27 '19 at 18:27
  • Could you change the units on your values of x and y so as to make them integers? For example convert values of meters between 0 and 1, to be centimeters valued between 0 and 100? – johnckane Feb 27 '19 at 20:07
  • Updated answer to work for arbitrary bin dimensions. – Jon Spring Feb 27 '19 at 22:11
0

As not everything should be a ggplot2 I will add an alternative solution based on sp and raster packages.

Here is the code:

library(sp)
library(raster)

set.seed(2222)

# Lets create 10 x 15 tiles
NCOLS = 10
NROWS = 15

data_ex <- data.frame(x = runif(1000, 0, 10),
                      y = runif(1000, 0, 10),
                      z = runif(1000, 0, 1))

# Create spatial points
dat_sp <- SpatialPointsDataFrame(data_ex[, 1:2], data = data_ex["z"])

# Create reference raster
r <- raster(ncols = NCOLS, nrows = NROWS, ext = extent(c(0, 10, 0, 10))) 

# Convert to a raster with z averaging
# Also could be any aggregation function like min, max, etc.
dat_rast <- rasterize(dat_sp, r, field = "z", fun = mean)

# Plot with base graphics
plot(dat_rast)

Here is a result: Result with base graphics

If you still want to plot it with ggplot, you can use graphVis packages:

# Plot with ggplot2
library(ggplot2)
library(rasterVis)

gplot(dat_rast) + geom_tile(aes(fill = value))

The result: enter image description here

Istrel
  • 2,508
  • 16
  • 22