0

I'm working on a project where I have a very large amount of points and I am looking to identify regions (defined by a lack of clustering) where the density of these points is statistically significantly less relative to others. Normally a visual would be enough but I have so many points that it is to difficult to tell where these empty spaces are and a density heat map doesn't help me zero in on smaller regions. Maybe I'm missing something very simpler here, but I am hoping someone can at least send me in the right direction of where to look. Below is a reproducible sample quick and dirty lets take these points from open data and map them to the borough file for NYC:

#libraries--------------------------

library(ggplot2)
library(ggmap)
library(sp)
library(jsonlite)
library(RJSONIO)
library(rgdal)

#call api data--------------------------

df =  fromJSON("https://data.cityofnewyork.us/resource/24t3-xqyv.json?$query= SELECT Lat, Long_")
df <- data.frame(t(matrix(unlist(df),nrow=length(unlist(df[1])))))
names(df)[names(df) == 'X2'] = 'x'
names(df)[names(df) == 'X1'] = 'y'
df = df[, c("x", "y")]
df$x = as.numeric(as.character(df$x))
df$y = as.numeric(as.character(df$y))
df$x = round(df$x, 4)
df$y = round(df$y, 4)
df$x[df$x < -74.2] = NA
df$y[df$y < 40.5] = NA
df = na.omit(df)

#map data----------------------------


cd = readOGR("nybb.shp", layer = "nybb")
cd = spTransform(cd, CRS("+proj=longlat +datum=WGS84"))
cd_f = fortify(cd)


#map data
nyc = ggplot() +
  geom_polygon(aes(x=long, 
                   y=lat, group=group), fill='grey', 
               size=.2,color='black', data=cd_f, alpha=1)


nyc + geom_point(aes(x = x, y = y), data = df, size = 1)

#how would I go about finding the empty spaces? That is the regions where there are no clusters?

In this case there aren't a lot of points but for the sake of demonstration, how would I:

  1. identify pockets of low density
  2. potentially draw polygon boundaries on those pockets?

Appreciate the help!

LoF10
  • 1,907
  • 1
  • 23
  • 64

1 Answers1

0

One way to get polygonal areas of low density would be to construct the Dirichlet/Voronoi tesselation and choose the largest ones.

Below I use spatstat and deldir (loaded by spatstat) to do this. It is not so fast so with many more points I don't know how well it will go.

To use the results in ggmap and other spatial packages you can convert back from owin and ppp to the spatial classes from sp and use spTransform to get lat, long coordinates.

First load the packages:

library(maptools)
library(spatstat)
library(jsonlite)

Map and points in coordinates of shapefile (note I read in data from local files downloaded from www):

cd = readOGR("nybb.shp", layer = "nybb")
#> OGR data source with driver: ESRI Shapefile 
#> Source: "nybb.shp", layer: "nybb"
#> with 5 features
#> It has 4 fields
df <- fromJSON("NYC_data.json")
df <- as.data.frame(matrix(as.numeric(unlist(df)), ncol = 2, byrow = TRUE))
df <- df[, c(2, 1)]
names(df) <- c("x", "y")
df <- df[df$x > -74.2 & df$y > 40.5, ]
coordinates(df) <- ~x+y
proj4string(df) <- CRS("+proj=longlat +datum=WGS84")
df2 <- spTransform(df, proj4string(cd))

Switch to spatstat classes:

cd2 <- as(cd, "SpatialPolygons")
W <- as(cd2, "owin")

X <- as(df2, "ppp")
Window(X) <- W

plot(X, main = "")

Compute Dirichlet tessellation and areas and plot the tessellation:

d <- dirichlet(X)
#> Warning: 96 duplicated points were removed
a <- tile.areas(d)
plot(d, main = "")

Combine the n_areas biggest areas of the tessellation:

n_areas <- 30
empty <- tess(tiles = d$tiles[tail(order(a), n = n_areas)])
empty2 <- as.owin(empty)

Plot the result:

plot(W, main = "")
plot(empty2, col = "red", add = TRUE)
plot(X, add = TRUE)

Ege Rubak
  • 4,347
  • 1
  • 10
  • 18