I have three GeoTIFFs, each roughly 500 MB in size on AWS' S3, which I am trying to process on an EMR cluster using Dask, but I obtain a MemoryError after the processing the first tiff.
After reading the GeoTIFF using xarray.open_rasterio()
, I convert the grid values to boolean then multiply the array by a floating point value. This workflow has executed successfully on three GeoTIFFs 50 MBs in size. Additionally, I have tried using chunking when reading with xarray, but have obtained the same results.
Is there a size limitation with Dask or another possible issue I could be running into?