1

I'm building a transition matrix of land use change (state) over the years.

I'm therefore comparing shapefile years after years and build a dataframe with:

Landuse year1 - Landuse year2 - ....- ID- centroid

with the following function :

full_join(landuse1, landuse2, by="centroid") 

where centroid is the actual centroid of the polygons. A centroid, is basically a vector of two numeric value.

However, the centroid, year after year, can slitghly shift (because the polygon actually change a little bit) leading in incomplete data gathering through the full_join function because centroid must exactly match.

I'd like to include a "more or less" argument, so that that any centroid close enough to the one from the year before can be joined to the datagrame for that particular polygon. But I'm not sure how ?

Thank you in advance.

Sarahdata
  • 309
  • 3
  • 15

1 Answers1

2

So the general term for what you are trying to do is called fuzzy matching. Im not sure how exactly it would work for the coordinates of a centroid. My Idea would be to calculate the distance between the Coordinates, and then set a margin of error, say 0.5%, and if they deviate from each other by less than that you could declare it a match. Basically loop through your list of locations and give the matches some unique ID, which you can then use for the join

  • Thank you Sven for your answer. That will however take forever... I have 1 812 000 polygons to loop through.. Isn't there a built-in function of fuzzy join ? – Sarahdata Aug 25 '22 at 12:02
  • 2
    https://stackoverflow.com/questions/20590119/fuzzy-matching-of-coordinates I found another post talking about a similar topic. I think calculating the distances between some coordinates should be quite a quick calculation even for a large dataset like the one you have. Then calculating the deviation as a % should also not take long computationally since its a linear operation – Sven Asmussen Aug 25 '22 at 12:08