1

Using the answer from a previous question (Structure Query for osmextract ), I want to use the R package osmextract to get a subset of keys/tags (e.g. shop=supermarket) for the whole Europe.osm.pbf from Geofabrik.

I am wondering if it is possible to avoid oe_get() translating ALL features of a layer (e.g. point, line or multipolygon) into a gpkg in the first place and conversely just have it translate a subset of features into a gpkg (e.g. only amenity=school).

The previous response does this in two steps:

  1. Load the data and put each layer into a gpkg
  2. Select specific features from a layer within this gpkg with a query

This leads to the production of a possibly huge gpkg in the first step, from which then a selection is made. It would be a saving of processing time and file size if these two steps could be done in one. I am not used to working with databases or similar, thats why I don't know if this kind of procedure is even possible with pbf files or if the translation into a gpkg is necessary to be able to do a query of this sort in the first place.

Some example code of my try with a mid-sized pbf file (Austria):

## STEP 1: (Download and) convert data into gpkg. (NB skipping dl here because file already downloaded)

## Each layer (line, multipolygon, point) will be consecutively added to the gpkg 'geofabrik_austria-latest'

oe_get("Austria", layer = "lines", provider = "geofabrik", force_download = FALSE, extra_tags = c("maxspeed", "oneway"), download_only = TRUE)
# This took about 8 minutes with my laptop (pbf filesize 2.5% of Europe.osm.pbf)
oe_get("Austria", layer = "multipolygons", provider = "geofabrik", force_download = FALSE, download_only = TRUE)
# This took about 13 minutes with my laptop (pbf filesize 2.5% of Europe.osm.pbf)
oe_get("Austria", layer = "points", provider = "geofabrik", force_download = FALSE, extra_tags = c("amenity", "shop"), download_only = TRUE)
# This took about 2 minutes with my laptop (pbf filesize 2.5% of Europe.osm.pbf)

# Total processing time: about 23 mins (Europe.osm.pbf file is 40 times bigger and estimated to take 15.3 hrs)
# gpkg file size 5.9 times bigger than pbf filesize (geofabrik_europe-latest.gpkg filesize is estimated 158.4 GB)

## STEP 2: Manipulate gpkg: Select keys/features

austria_roads_lines_gpkg <- oe_get(
  place = "Austria", 
  layer = "lines", 
  query = "SELECT * FROM lines WHERE highway IN ('motorway', 'trunk', 'primary', 'secondary', 'tertiary', 'unclassified', 'residential', 'motorway_link', 'trunk_link', 'primary_link', 'secondary_link', 'tertiary_link', 'living_street', 'service')",
  quiet = TRUE
)
austria_shops_multipoly_gpkg <- oe_get(
  place = "Austria", 
  layer = "multipolygons",
  query = "SELECT * FROM multipolygons WHERE shop IN ('convenience', 'greengrocer', 'supermarket')",
  quiet = TRUE
)
austria_shops_points_gpkg <- oe_get(
  place = "Austria", 
  layer = "points",
  query = "SELECT * FROM points WHERE shop IN ('convenience', 'greengrocer', 'supermarket')",
  quiet = TRUE
)
JiannisK
  • 9
  • 1
  • I actually found a solution myself by applying an SQL-like query within a character vector with the options that will be passed to ogr2ogr during the vectortranslate process. Should I post details as an answer or delete question? – JiannisK Oct 20 '22 at 10:16

0 Answers0