0

I'm looking to join two tables with Apache Sedona (formerly GeoSpark) and getting unexpected differences between two approaches. In particular ST_Distance seems to produce some strange results, and I can't figure out if it's an issue with Sedona or (more likely) something about the way I'm using the function.

I'm trying to join where the objects are less than 5m apart from each other.

Approach one (using a buffer alongside intersects)

SELECT
  MAX(ST_Distance(left.geometry, right.geometry)
FROM
  left
  INNER JOIN
  right
  ON ST_INTERSECTS(left.geometry, ST_BUFFER(right.geometry, 5))

This produces an expected result (i.e. the biggest distance between any of these objects is about 5)

Approach two (using an evaluation of distance)

SELECT
  MAX(ST_Distance(left.geometry, right.geometry)
FROM
  left
  INNER JOIN
  right
  ON ST_DISTANCE(left.geometry, right.geometry) < 5

This returns a big number of about 400 which seems wrong.

To me, it seems like the results of the two queries should be more or less identical, especially as the latter looks like the Apache Documentation recommends it.

Is there a reason why these two queries should evaluate differently? What have I missed about the nature of these functions, or is this really a problem with Sedona?

houseofleft
  • 347
  • 1
  • 12
  • `ST_DISTANCE(left.geometry, left.geometry)` should always return 0, if that's your actual query. Perhaps if you considered the `right` geometry too? If that's just a typo in the question and not your real code, realise that it's far easier for us if you show us a query that *you have run* and verified it produces the results you claim. – Damien_The_Unbeliever Jun 18 '21 at 13:43
  • (I'm couching it like this since the first query should also suffer a similar fate) – Damien_The_Unbeliever Jun 18 '21 at 13:44
  • Thanks @Damien_The_Unbeliever, unfortunately it's just a typo in the question (have now updated). The query is identical to the actual query I ran, have just changed the table names to 'left' and 'right' to make it easier to read. – houseofleft Jun 18 '21 at 14:35
  • 1
    Is it full query, or some elements like projections or ST_TRANSFORM are omitted? I would say if this is full query, and there are no other typos, it should be a bug in Sedona and you should report it to Apache. – Michael Entin Jun 27 '21 at 18:21

0 Answers0