How to map() over Google Earth Engine ee.FeatureCollection in 5000-element chunks in Python?

Question

After converting my GeoDataFrame with 120,000+ rows of point objects to one giant ee.FeatureCollection and trying to map() over it for topographical attributes, I found out Google Earth Engine queries will cut out after accumulating more than 5000 elements. I have gotten two related errors with different functions:

Request payload size exceeds the limit: 10485760 bytes. Collection query aborted after accumulating over 5000 elements

What I've done is break my GeoDataFrame into smaller GeoDataFrames of 5000 rows each, and use geemap.geopandas_to_ee() to convert each of them to an ee.FeatureCollection, and put each of those into a list to iterate over. With each iteration I assigned the results to a placeholder GeoDataFrame, and concatenated it with another GeoDataFrame holding results from previous iterations.

featCol_list = [
  geemap.geopandas_to_ee(gdf[0:5000]),
  geemap.geopandas_to_ee(gdf[5000:10000]),
  geemap.geopandas_to_ee(gdf[10000:15000]),
  ...
  geemap.geopandas_to_ee(gdf[115000:120000]),
  geemap.geopandas_to_ee(gdf[120000:])]

def get_topo(feat):
  elevation_img = ee.Image('USGS/SRTMGL1_003').select('elevation')
  slope_img = ee.Terrain.slope(ee.Image('USGS/SRTMGL1_003')).select('slope')
  aspect_img = ee.Terrain.aspect(ee.Image('USGS/SRTMGL1_003')).select('aspect')
  elevation = elevation_img.sample(feat.geometry(),scale=10).first().get('elevation')
  slope = slope_img.sample(feat.geometry(),scale=10).first().get('slope')
  aspect = aspect_img.sample(feat.geometry(),scale=10).first().get('aspect')
  return feat.set({'elevation':elevation,
                   'slope':slope,
                   'aspect':aspect})

gdf_all = gpd.GeoDataFrame()
for featCol in featCol_list:
  topoCol = featCol.map(get_topo)
  gdf = geemap.ee_to_geopandas(topoCol)
  gdf = gdf.set_crs('EPSG:4326')
  # Concatenate new data with placeholder GeoDataFrame
  gdf_all = gpd.GeoDataFrame(pd.concat([gdf.copy(),gdf_all.copy()], ignore_index=True), crs='EPSG:4326')

My brute force method got me the results in what I thought was a reasonable amount of time (~30min), but it's ugly and tedious, and I definitely don't want to do this for a GeoDataFrame with 2 million rows. Is there a more elegant way to map() over a FeatureCollection with more than 5000 Features, either server- or client-side?

score 0 · Answer 1 · answered Jun 07 '23 at 04:26

I don't see a better method to split your GeoDataFrame. However, you don't need to hard code the chunks. Use numpy.array_split:

import numpy as np

CHUNKSIZE = 5000

gdfs = []
for featCol in np.array_split(gdf, range(0, len(df), CHUNKSIZE)):
  topoCol = featCol.map(get_topo)
  gdf = geemap.ee_to_geopandas(topoCol)
  gdfs.append(gdf.set_crs('EPSG:4326')

gdf_all = gpd.GeoDataFrame(pd.concat(gdfs, axis=0, ignore_index=True), crs='EPSG:4326')

How to map() over Google Earth Engine ee.FeatureCollection in 5000-element chunks in Python?

1 Answers1