0

I am trying to generate a grid for a given (multi) polygon. I understand a grid as a collection of h3 indices within a (multi)polygon boundary.

Here is the code that I implemented so far:

 def generate_grid(region_bounds: gpd.GeoDataFrame) -> pd.DataFrame:
        """
        Generates H3 resolution 10 grid. It utilizes the h3.polyfill method.
        For more detail see https://geographicdata.science/book/data/h3_grid/build_sd_h3_grid.html

        Returns: a dataframe with the following columns:
            <index, h3_res_10>
        """
        logging.info("Start grid generation")
        start = time.time()
        resolution = 10
        grid = region_bounds.h3.polyfill(resolution)
        end = time.time()
        logging.info(f"grid generation took {end - start} sec")
        # convert polyfill result to df
        grid_df = pd.DataFrame.from_dict({"h3_res_10": grid.h3_polyfill[0]})
        logging.info(grid_df)
        return grid_df

The problem occurs when the region to process is big, like a country or state. Is there any way to run the polyfill in parallel for multiple subregions? How can I efficiently split the region into subregions to run h3.polyfill in parallel?

user1877600
  • 627
  • 1
  • 9
  • 26

1 Answers1

0

Yes, polyfill (polygonToCells in v4) can be CPU/memory intensive for large regions at fine resolutions of H3. Res 10 is roughly a city block, so a large country will likely have millions of cells.

The best option at the moment is to split up the input into contiguous polygons. The resulting cells will not overlap if the input polygons do not overlap. In Python, you could try Shapely to split the polygon, simply taking vertical tranches with successive north-south lines as the splitters: shapely.ops.split

The resulting set of polygons can then be processed either serially or in parallel with much less memory pressure. CPU time is roughly the same or higher, but could be split across threads.

nrabinowitz
  • 55,314
  • 10
  • 149
  • 165