0

I have the following boundary dataset for the United Kingdom, which shows all the counties:

library(raster)
library(sp)
library(ggplot)

# Download the data
GB <- getData('GADM', country="gbr", level=2)

Using the subset function it is really easy to filter the shapefile polygons by an attribute in the data. For example, if I want to exclude Northern Ireland:

GB_sub <- subset(UK, NAME_1 != "Northern Ireland")

However, there are lots of small islands which distort the scale data range, as shown in the maps below:

enter image description here

Any thoughts on how to elegantly subset the dataset on a minimum size? It would be ideal to have something in the format consistent with the subset argument. For example:

GB_sub <- subset(UK, Area > 20) # specify minimum area in km^2
Michael Harper
  • 14,721
  • 2
  • 60
  • 84
  • You can crop it with another raster of Great Britain mainland. This post [link](https://stackoverflow.com/questions/16641151/clipping-rasters-in-r) could be helpful. – patL Nov 02 '17 at 15:25

2 Answers2

1

This is one potential solution:

GB_sub = GB[sapply(GB@polygons, function(x) x@area>0.04),] # select min size
map.df <- fortify(GB_sub)
ggplot(map.df, aes(x=long, y=lat, group=group)) + geom_polygon()

Check this link for specifics on the actual interpretation of km2 size: Getting a slot's value of S4 objects?

I compared both as well but they don't seem to differ:

out1 = sapply(GB@polygons, function(x) x@area)
out2 = rgeos::gArea(GB, byid=TRUE)
timfaber
  • 2,060
  • 1
  • 15
  • 17
  • Having read around a bit, the approach for `x@area` can be problematic for shapefiles with a hole in. However, it is not a problem in this scenario. – Michael Harper Nov 02 '17 at 15:39
  • Yes indeed, the link inside my answer describes this particular issue quite nicely – timfaber Nov 02 '17 at 15:47
1

Here is another potential solution. Because your data is in lat-long projection, directly calculating the area based on latitude and longitude would cause bias, it is better to calculate the area based on functions from the geosphere package.

install.packages("geosphere")
library(geosphere)

# Calculate the area
GB$poly_area <- areaPolygon(GB) / 10^6

# Filter GB based on area > 20 km2
GB_filter <- subset(GB, poly_area > 20)

poly_area contains the area in km2 for all polygons. We can filter the polygon by a threshold, such as 20 in your example. GB_filter is the final output.

Michael Harper
  • 14,721
  • 2
  • 60
  • 84
www
  • 38,575
  • 12
  • 48
  • 84