I've found a shapefile for 4 digit postal codes in the Netherlands here:
What I'm trying to do is to combine postcodes that share the first two digits, and then plot them on a map.
Unfortunately there seems to be a problem with the data - there are some areas in the Netherlands that don't seem to be covered. As a result, when I combine the polygons to get the 2-digit polygons (from the 4-digit ones), there are holes in the resulting multipolygons.
I need to fill in these holes. I've seen posts for similar issues but nothing seems to do exactly what I need. In particular, I saw a post about using concave hulls, but this seems like overkill here.
There are several artifacts I've managed to fix (e.g. "slivers") but the holes remain.
Here's what I have so far:
from shapely.ops import cascaded_union
from shapely.geometry import JOIN_STYLE, Polygon, MultiPolygon
import os
import folium
import pandas as pd
import geopandas as gpd
GEOM_LOC = r"PATH_TO_FILE_ABOVE\openpostcodevlakkenpc4.shx"
# Get the data
geom = gpd.read_file(GEOM_LOC)
# Remove empty or nan records
is_empty = geom.geometry.is_empty
is_nan = geom.geometry.isna()
geom = geom[~(is_empty | is_nan)]
# Add field for 2 digit postcode
geom["digit"] = geom.pc4.apply(lambda x: x[0:2])
geom = geom[["digit", "geometry"]]
# Make dataframe for 2-digit zones
tags = list(set(geom["digit"]))
df = pd.DataFrame(tags, columns=["tag"])
# Function returning union of geometries
def combine_borders(*geoms):
return cascaded_union([
geom if geom.is_valid else geom.buffer(0) for geom in geoms
])
# Wrapping function above for application to dataframe
def combine(tag):
sub_geom = geom[geom["digit"] == tag]
bounds = list(sub_geom.geometry)
return(combine_borders(*bounds))
# Apply the function
df["boundary"] = df.tag.apply(lambda x: combine(x))
# The polygons generated above do not fit perfectly together,
# resulting in artifacts. We fix that now
eps = 0.00001
def my_buffer(area):
return(
area.buffer(eps, 1, join_style=JOIN_STYLE.mitre).buffer(-eps, 1, join_style=JOIN_STYLE.mitre)
)
df["boundary"] = df.boundary.apply(lambda x: my_buffer(x))
# Have a look at one of the holes
test = geom[geom.digit == "53"]
test = df.loc[df.tag == "53", "boundary"].item()
m = folium.Map(location=[51.8, 5.4], zoom_start=10, tiles="CartoDB positron")
m.choropleth(test)
m
As you see, I've tested on the two-digit zone "53". There is a hole that I need to fill:
I realise that the underlying geometry is quite complicated, but is there a straightforward way to fill this "hole"?
Many thanks for your help.
EDIT - 2020-04-25-16.14
For reference, here is the 2 digit postcode area "53" from wikipedia:
So it's not that there's a postcode-within-a-postcode - a postcode enclave.
EDIT - 2020-04-27
I found this post:
Convert Multipolygon to Polygon in Python
Applying the code
no_holes = MultiPolygon(Polygon(p.exterior) for p in m)
to my multipolygon closes the hole: