1

I am rasterizing polygons in a large raster using the following code:

import rasterio.features
from shapely.geometry import Polygon

p1 = Polygon([[0,0], [32000,0], [32000,32000], [0,0]])
out_shape = (32000, 32000)
# The default transform is fine here
r = rasterio.features.rasterize([p1], out_shape=out_shape)

This operation works fine if the Raster is smaller. For out_shape of (10000, 10000), it takes a couple of seconds but works fine. However, it fails for the given shape of (32000, 32000).

I looked into the code for rasterio.features.rasterize and it mentions that

If GDAL max cache size is smaller than the output data, the array of shapes will be iterated multiple times. Performance is thus a linear function of buffer size. For maximum speed, ensure that GDAL_CACHEMAX is larger than the size of out or out_shape.

I increased the GDAL_CACHEMAX using

from osgeo import gdal
max_gdal_cache_gb=64
gdal.SetCacheMax(int(max_gdal_cache_gb * 1e9))

However, still rasterio is enable to rasterize the large raster. Also I am not sure if the GDAL_CACHEMAX is actually increased. How can I fix it?

KarateKid
  • 3,138
  • 4
  • 20
  • 39

1 Answers1

0

I just found out that the above code worked fine but the issue was in the next step. Anyways, here is a improved version of the code, where the max cache is also printed:

import rasterio
import rasterio.features
from shapely.geometry import Polygon
from osgeo import gdal

max_gdal_cache_gb=64

# Create a global rasterio environment
global_env = rasterio.Env(GDAL_CACHEMAX=int(max_gdal_cache_gb * 1e9))

with global_env:
    # Get the current cache size
    rasterio_cache_size = gdal.GetConfigOption("GDAL_CACHEMAX")
    print(f"Rasterio cache size: {rasterio_cache_size} bytes")


p1 = Polygon([[0,0], [32000,0], [32000,32000], [0,0]])
out_shape = (32000, 32000)
# The default transform is fine here
with global_env:
    r = rasterio.features.rasterize([p1], out_shape=out_shape)
KarateKid
  • 3,138
  • 4
  • 20
  • 39