Get more aggregate shapes from shapefile

Question

As described in this previous question, I'm plotting German zip codes. The most granular level is 5-digit, e.g. 10117. I would like to draw boundaries defined on the two digit level around, while also keeping the granularity of the zip codes for coloring.

Here again the first part of my code to color in zip codes:

# for loading our data
library(raster)
library(readr)
library(readxl)
library(sf)
library(dplyr)

# for datasets
library(maps)
library(spData)

# for plotting
library(grid)
library(tmap)
library(viridis)

Get shape files for Germany (link). In Germany zip codes are called Postleitzahlen (PLZ).

germany <- read_sf("data/OSM_PLZ.shp")

Classify PLZs into arbitrary groups for plotting.

germany <- germany %>% 
  mutate(plz_groups = case_when(
    substr(plz, 1, 1) == "1" ~ "Group A",
    substr(plz, 2, 2) == "2" ~ "Group B",    
    TRUE ~ "Group X" # rest
  ))

Make plot filling by PLZ:

tm_shape(germany) +
  tm_fill(col = "plz_groups")

The boundaries I want to draw are the following:

I managed to draw German state boundaries, building on the answer in the other post. But for this I used a different shape file. I haven't found existing shapefiles on the level I'm looking for.

Can I use the granular shapefile I already have and somehow it aggregate up to the 2-digit zip code level?

score 5 · Accepted Answer · answered Feb 14 '23 at 16:59

You can create a new variable on your object containing the first two characters of the zip code and use dplyr::group_by() to combine each zip code into a zone.

See a reprex with just a subset of the zip codes, since this operation takes a long time on your dataset:

url <- "https://opendata.arcgis.com/api/v3/datasets/5b203df4357844c8a6715d7d411a8341_0/downloads/data?format=geojson&spatialRefId=4326&where=1%3D1"

tmp <- "DEU.geojson"
download.file(url, tmp, mode = "wb")

library(sf)
library(dplyr)
library(tmap)

germany <- st_read(tmp) |> st_make_valid()

# Create an index
germany$plz_groups <- substr(germany$plz, 1, 2)

# order levels
lvs <- unique(sort(germany$plz_groups))

germany$plz_groups <- factor(germany$plz_groups, levels = lvs)


# Group with dplyr

germany_group <- germany %>%
  # For the example, just from 01 to 8
  # You should remove this filter
  filter(plz_groups  %in% lvs[1:8]) %>%
  select(plz_groups) %>%
  group_by(plz_groups) %>%
  summarise(n = n()) %>%
  # Additionally remove holes
  nngeo::st_remove_holes()

tm_shape(germany_group) +
  tm_polygons() +
  tm_shape(germany_group) +
  tm_text(text = "plz_groups")

Also the dataset (note that I used the `geojson` version since `.shp` is an old and multi-file format that I personally hate) is not perfect. Polygons are not completely aligned and the geometry itself is not valid, that's whi I needed to do `sf::st_make_valid()` and `nngeo::st_remove_holes()` to get meaningful results. I think you'll face further problems due the (expected) lack of spatial validity also in the rest of zip codes — dieghernan, Feb 14 '23 at 17:02

Robert Hijmans · Answer 2 · 2023-02-14T18:39:50.067

The data

url <- "https://services2.arcgis.com/jUpNdisbWqRpMo35/arcgis/rest/services/PLZ_Gebiete/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson"
s <- sf::st_read(v)

Process with terra

library(terra)
v <- vect(s)
# create codes and subset
v$plz1 = substr(v$plz, 1, 1)
v$plz2 = substr(v$plz, 1, 2)
v <- v[, c("plz1", "plz2")]

# aggregate and remove small holes
plz2 <- aggregate(v, "plz2") |> fillHoles()
plz1 <- aggregate(plz2, "plz1") |> fillHoles()

# create variable "group"
i <- as.numeric(plz1$plz1)
i[i<1 | i>2] <- 3
plz1$group <- c("Group A", "Group B", "Group X")[i]

And plot

plot(plz1, "group", lwd=3, mar=0, axes=FALSE, legend="topleft",
            col=c("pink", "light blue", "light yellow"))
lines(plz2, col=gray(.4))
text(plz2, "plz2", cex=.7, halo=TRUE)

You can fix the placement of the labels by selecting better locations for the ones that overlap. You can get the ones that are used with

p <- centroids(plz2, inside=TRUE)
xy <- crds(p)

score 1 · Answer 3 · answered Feb 14 '23 at 18:53

Here is one option with tidyverse and ggplot. I have made a little change to your case_when because I thought you might be looking for those plz which start either with 1 or with 2.

library(tidyverse)
library(sf)

germany <- read_sf(here::here(
  "DATA", "Postleitzahlengebiete_-_OSM", "OSM_PLZ.shp"
))

de <- germany |>
  mutate(plz_group1 = case_when(
    substr(plz, 1, 1) == "1" ~ "Group A",
    substr(plz, 1, 1) == "2" ~ "Group B",
    TRUE ~ "Group X" # rest
  )) |>
  mutate(plz = as.numeric(plz)) |>
  mutate(plz_group2 = stringr::str_extract(plz, "^\\d{2}")) |>
  mutate(plz_group2 = as.numeric(plz_group2)) |>
  group_by(plz_group2, plz_group1) |>
  summarize()

ggplot() +
  geom_sf(data = de, aes(fill = plz_group1)) +
  geom_sf_label(data = de, aes(label = plz_group2))

Get more aggregate shapes from shapefile

3 Answers3

Linked