I'm looking to join two tables with Apache Sedona (formerly GeoSpark) and getting unexpected differences between two approaches. In particular ST_Distance seems to produce some strange results, and I can't figure out if it's an issue with Sedona or (more likely) something about the way I'm using the function.
I'm trying to join where the objects are less than 5m apart from each other.
Approach one (using a buffer alongside intersects)
SELECT
MAX(ST_Distance(left.geometry, right.geometry)
FROM
left
INNER JOIN
right
ON ST_INTERSECTS(left.geometry, ST_BUFFER(right.geometry, 5))
This produces an expected result (i.e. the biggest distance between any of these objects is about 5)
Approach two (using an evaluation of distance)
SELECT
MAX(ST_Distance(left.geometry, right.geometry)
FROM
left
INNER JOIN
right
ON ST_DISTANCE(left.geometry, right.geometry) < 5
This returns a big number of about 400 which seems wrong.
To me, it seems like the results of the two queries should be more or less identical, especially as the latter looks like the Apache Documentation recommends it.
Is there a reason why these two queries should evaluate differently? What have I missed about the nature of these functions, or is this really a problem with Sedona?