Using the answer from a previous question (Structure Query for osmextract ), I want to use the R package osmextract to get a subset of keys/tags (e.g. shop=supermarket) for the whole Europe.osm.pbf from Geofabrik.
I am wondering if it is possible to avoid oe_get() translating ALL features of a layer (e.g. point, line or multipolygon) into a gpkg in the first place and conversely just have it translate a subset of features into a gpkg (e.g. only amenity=school).
The previous response does this in two steps:
- Load the data and put each layer into a gpkg
- Select specific features from a layer within this gpkg with a query
This leads to the production of a possibly huge gpkg in the first step, from which then a selection is made. It would be a saving of processing time and file size if these two steps could be done in one. I am not used to working with databases or similar, thats why I don't know if this kind of procedure is even possible with pbf files or if the translation into a gpkg is necessary to be able to do a query of this sort in the first place.
Some example code of my try with a mid-sized pbf file (Austria):
## STEP 1: (Download and) convert data into gpkg. (NB skipping dl here because file already downloaded)
## Each layer (line, multipolygon, point) will be consecutively added to the gpkg 'geofabrik_austria-latest'
oe_get("Austria", layer = "lines", provider = "geofabrik", force_download = FALSE, extra_tags = c("maxspeed", "oneway"), download_only = TRUE)
# This took about 8 minutes with my laptop (pbf filesize 2.5% of Europe.osm.pbf)
oe_get("Austria", layer = "multipolygons", provider = "geofabrik", force_download = FALSE, download_only = TRUE)
# This took about 13 minutes with my laptop (pbf filesize 2.5% of Europe.osm.pbf)
oe_get("Austria", layer = "points", provider = "geofabrik", force_download = FALSE, extra_tags = c("amenity", "shop"), download_only = TRUE)
# This took about 2 minutes with my laptop (pbf filesize 2.5% of Europe.osm.pbf)
# Total processing time: about 23 mins (Europe.osm.pbf file is 40 times bigger and estimated to take 15.3 hrs)
# gpkg file size 5.9 times bigger than pbf filesize (geofabrik_europe-latest.gpkg filesize is estimated 158.4 GB)
## STEP 2: Manipulate gpkg: Select keys/features
austria_roads_lines_gpkg <- oe_get(
place = "Austria",
layer = "lines",
query = "SELECT * FROM lines WHERE highway IN ('motorway', 'trunk', 'primary', 'secondary', 'tertiary', 'unclassified', 'residential', 'motorway_link', 'trunk_link', 'primary_link', 'secondary_link', 'tertiary_link', 'living_street', 'service')",
quiet = TRUE
)
austria_shops_multipoly_gpkg <- oe_get(
place = "Austria",
layer = "multipolygons",
query = "SELECT * FROM multipolygons WHERE shop IN ('convenience', 'greengrocer', 'supermarket')",
quiet = TRUE
)
austria_shops_points_gpkg <- oe_get(
place = "Austria",
layer = "points",
query = "SELECT * FROM points WHERE shop IN ('convenience', 'greengrocer', 'supermarket')",
quiet = TRUE
)