1

I'm converting a process from postgreSQL over to DataBrick ApacheSpark,

The postgresql process uses the following sql function to get the point on a map from a X and Y value. ST_Transform(ST_SetSrid(ST_MakePoint(x, y),4326),3857)

Does anyone know how I can achieve this same logic in SparkSQL o databricks?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • You will probably have to replace PostGIS with another librarby. [This link](https://databricks.com/de/session_na20/geospatial-options-in-apache-spark) is a good starting point for your research – werner Sep 06 '21 at 14:08

1 Answers1

1

To achieve this you need to use some library, like, Apache Sedona, GeoMesa, or something else. Sedona, for example, has the ST_TRANSFORM function, maybe it has the rest as well.

The only thing that you need to take care, is that if you're using pure SQL, then on Databricks you will need:

  • install Sedona libraries using the init script, so libraries should be there before Spark starts
  • set Spark configuration parameters, as described in the following pull request

Update June 2022nd: people at Databricks developed the Mosaic library that is heavily optimized for geospatial analysis on Databricks, and it's compatible with standard ST_ functions.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132