2

I have a geodataframe ("GDF") with one column as "values", and another column as "geometry" (that in fact are actual geographical regions), so each row represents a region.

The "values" column is zero in many rows, and a large number in some rows.

I need to make a "moving average" or rolling average, using the nearest neighbors up to a certain "max_distance" (we can assume that the GDF has a locally projected CRS, so the max_distance has real meaning). Thus, the averaged_values would have neither zero or large values in most of the regions, but an average value.

One way to do it would be

for region in GDF:
    averaged_values=sjoin_nearest(GDF,GDF,maxdistance=1000).values.mean()

But really I don't know how to proceed.

The expected output would be a geodataframe with 3 columns: "values", "averaged_values", and "geometry".

Any ideas?

ElTitoFranki
  • 375
  • 1
  • 7
  • Please add more information such as the geodataframe and some code showing what you have attempted. – DPM May 31 '22 at 18:03

1 Answers1

3

What you are trying to do is also called a spatial lag. The best way is to create spatial weights matrix based on a set distance and compute the lag, both using libpysal library, which is a part of the geopandas ecosystem.

import libpysal

# create weights
W = libpysal.weights.DistanceBand.from_dataframe(gdf, threshold=1000)

# row-normalise weights
W.transform = "r"

# create lag
gdf["averaged_values"] = libpysal.weights.lag_spatial(W, gdf["values"])
martinfleis
  • 7,124
  • 2
  • 22
  • 30