GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines.
Questions tagged [geospark]
45 questions
0
votes
1 answer
How to create a PolygonRDD from H3 boundary?
I'm using Apache Spark with Apache Sedona (previously called GeoSpark), and I'm trying to do the following:
Take a DataFrame containing latitude and longitude in each row (it comes from an arbitrary source, it neither is a PointRDD nor comes from a…

PiFace
- 526
- 3
- 19
0
votes
0 answers
Why are there unexpected differences between Apache Sedona's ST_Buffer and ST_Distance?
I'm looking to join two tables with Apache Sedona (formerly GeoSpark) and getting unexpected differences between two approaches. In particular ST_Distance seems to produce some strange results, and I can't figure out if it's an issue with Sedona or…

houseofleft
- 347
- 1
- 12
0
votes
1 answer
GeoSpark show SQL results fails
I am using GeoSpark 1.3.1 where I am trying to find all geo points that are contained in a POLYGON. I use the sql command:
val result = spark.sql(
|SELECT *
|FROM spatial_trace, streetCrossDf
|WHERE ST_Within (streetCrossDf.geometry,…

Arnold
- 5
- 2
0
votes
1 answer
Getting GeoSpark error with upload_jars function
I'm trying to run GeoSpark in AWS EMR cluster. The code is:
# coding=utf-8
from pyspark.sql import SparkSession
import pyspark.sql.functions as f
import pyspark.sql.types as t
from geospark.register import GeoSparkRegistrator
from geospark.utils…

Shadowtrooper
- 1,372
- 15
- 28
0
votes
1 answer
Assertion failed on spark using GeoSpark
I have the following dataframe :
+--------------+-------------------+---------------------+
|longitude_f | latitude_f | geom |
+--------------+-------------------+---------------------+
|7.0737816 |33.82666 | …
user13117513
0
votes
0 answers
org.apache.spark.sql.types.Decimal cannot be cast to org.apache.spark.unsafe.types.UTF8String ___ using Spark / Java
I have the following dataframe :
+-------------+-----------------+------------------+
|longitude |latitude |geom |
+-------------+-----------------+------------------+
|-7.07378166 |33.826661 [00 00 00 00 01…

HBoulmi
- 333
- 5
- 16
0
votes
1 answer
st_geomfromtext assertion failed using spark java
When I rum the following code
Dataset df = sparkSession.sql("select -7.07378166 as longitude, 33.826661 as latitude");
df.withColumn("ST_Geomfromtext ",
expr("ST_GeomFromText(CONCAT('POINT(',longitude,' ',latitude,')'),4326)"))
…

HBoulmi
- 333
- 5
- 16
0
votes
1 answer
GeoSpark transform SQL function fails
I am using GeoSpark 1.3.1 where I am trying to find all geo points that are contained in a circle, given a center and radius in meters. To do this I wan't to translate the center from degree to meter, create the circle (using ST_Buffer) and then…

aweis
- 5,350
- 4
- 30
- 46
0
votes
1 answer
ClassNotFoundException geosparksql.UDT.GeometryUDT
I have been trying to convert a GeoPandas dataframe to PySpark Dataframe with no success. Currently, I have extended the DataFrame class to convert a GPD DF to Spark DF with the following:
from pyspark.sql import DataFrame
from pyspark.sql.types…

minimino
- 93
- 3
- 11
0
votes
1 answer
I am using the IDEA to develop the Spark Demo,How to set the params about spark memory size in the local mode?
I am running a geoSpark Demo in the local mode,not standalone.The data's size is about 5GB. And I am getting the OOM Error.The I want to change the spark memory in the local mode ,how to do it?

谭凯中
- 1
- 1
0
votes
1 answer
Maven package error using geospark library
Currently, I am working on geospatial analytics use case and I am using spark 2.4.0 along with geospark library.When I am trying to create the application jar file using eclipse it is giving me the below error.Could you please help me to resolve the…

Sumit D
- 171
- 1
- 3
- 14
0
votes
1 answer
transform JavapairRDD to dataframe using scala
I have a javapairRDD in below format
org.apache.spark.api.java.JavaPairRDD[com.vividsolutions.jts.geom.Geometry,com.vividsolutions.jts.geom.Geometry]
Key is a polygon and value is a point in the polygon
eg:
[(polygon(1,2,3,4), POINT…

user2981952
- 41
- 5
0
votes
1 answer
How to avoid gc overhead limit exceeded in a range query with GeoSpark?
I am using Spark 2.4.3 with the extension of GeoSpark 1.2.0.
I have two tables to join as range distance. One table (t1) if ~ 100K rows with one column only that is a Geospark's geometry. The other table (t2) is ~ 30M rows and it is composed by an…

Randomize
- 8,651
- 18
- 78
- 133
-1
votes
1 answer
How do I convert a geometry column from binary format to string format in a pyspark dataframe?
Here is my attempt at this:
%sql SELECT df1.*,df1.geometry.STAsText() as geom_text FROM df_geo df1.
This obviously fails because it is not a table, but a dataframe. How can one do this using pyspark or geospark?

Mazil_tov998
- 396
- 1
- 13
-1
votes
1 answer
Pyspark: why does the ST_intersects function return duplicated rows?
I am using the ST_Intersects function of geospark to make the intersection between points and polygons.
queryOverlap = """
SELECT p.ID, z.COUNTYNS as zone, p.date, timestamp, p.point
FROM gpsPingTable as p, zoneShapes as z
…

emax
- 6,965
- 19
- 74
- 141