Questions tagged [geospark]

GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines.

45 questions
0
votes
1 answer

How to create a PolygonRDD from H3 boundary?

I'm using Apache Spark with Apache Sedona (previously called GeoSpark), and I'm trying to do the following: Take a DataFrame containing latitude and longitude in each row (it comes from an arbitrary source, it neither is a PointRDD nor comes from a…
PiFace
  • 526
  • 3
  • 19
0
votes
0 answers

Why are there unexpected differences between Apache Sedona's ST_Buffer and ST_Distance?

I'm looking to join two tables with Apache Sedona (formerly GeoSpark) and getting unexpected differences between two approaches. In particular ST_Distance seems to produce some strange results, and I can't figure out if it's an issue with Sedona or…
houseofleft
  • 347
  • 1
  • 12
0
votes
1 answer

GeoSpark show SQL results fails

I am using GeoSpark 1.3.1 where I am trying to find all geo points that are contained in a POLYGON. I use the sql command: val result = spark.sql( |SELECT * |FROM spatial_trace, streetCrossDf |WHERE ST_Within (streetCrossDf.geometry,…
0
votes
1 answer

Getting GeoSpark error with upload_jars function

I'm trying to run GeoSpark in AWS EMR cluster. The code is: # coding=utf-8 from pyspark.sql import SparkSession import pyspark.sql.functions as f import pyspark.sql.types as t from geospark.register import GeoSparkRegistrator from geospark.utils…
0
votes
1 answer

Assertion failed on spark using GeoSpark

I have the following dataframe : +--------------+-------------------+---------------------+ |longitude_f | latitude_f | geom | +--------------+-------------------+---------------------+ |7.0737816 |33.82666 | …
user13117513
0
votes
0 answers

org.apache.spark.sql.types.Decimal cannot be cast to org.apache.spark.unsafe.types.UTF8String ___ using Spark / Java

I have the following dataframe : +-------------+-----------------+------------------+ |longitude |latitude |geom | +-------------+-----------------+------------------+ |-7.07378166 |33.826661 [00 00 00 00 01…
HBoulmi
  • 333
  • 5
  • 16
0
votes
1 answer

st_geomfromtext assertion failed using spark java

When I rum the following code Dataset df = sparkSession.sql("select -7.07378166 as longitude, 33.826661 as latitude"); df.withColumn("ST_Geomfromtext ", expr("ST_GeomFromText(CONCAT('POINT(',longitude,' ',latitude,')'),4326)")) …
HBoulmi
  • 333
  • 5
  • 16
0
votes
1 answer

GeoSpark transform SQL function fails

I am using GeoSpark 1.3.1 where I am trying to find all geo points that are contained in a circle, given a center and radius in meters. To do this I wan't to translate the center from degree to meter, create the circle (using ST_Buffer) and then…
aweis
  • 5,350
  • 4
  • 30
  • 46
0
votes
1 answer

ClassNotFoundException geosparksql.UDT.GeometryUDT

I have been trying to convert a GeoPandas dataframe to PySpark Dataframe with no success. Currently, I have extended the DataFrame class to convert a GPD DF to Spark DF with the following: from pyspark.sql import DataFrame from pyspark.sql.types…
minimino
  • 93
  • 3
  • 11
0
votes
1 answer

I am using the IDEA to develop the Spark Demo,How to set the params about spark memory size in the local mode?

I am running a geoSpark Demo in the local mode,not standalone.The data's size is about 5GB. And I am getting the OOM Error.The I want to change the spark memory in the local mode ,how to do it?
谭凯中
  • 1
  • 1
0
votes
1 answer

Maven package error using geospark library

Currently, I am working on geospatial analytics use case and I am using spark 2.4.0 along with geospark library.When I am trying to create the application jar file using eclipse it is giving me the below error.Could you please help me to resolve the…
Sumit D
  • 171
  • 1
  • 3
  • 14
0
votes
1 answer

transform JavapairRDD to dataframe using scala

I have a javapairRDD in below format org.apache.spark.api.java.JavaPairRDD[com.vividsolutions.jts.geom.Geometry,com.vividsolutions.jts.geom.Geometry] Key is a polygon and value is a point in the polygon eg: [(polygon(1,2,3,4), POINT…
0
votes
1 answer

How to avoid gc overhead limit exceeded in a range query with GeoSpark?

I am using Spark 2.4.3 with the extension of GeoSpark 1.2.0. I have two tables to join as range distance. One table (t1) if ~ 100K rows with one column only that is a Geospark's geometry. The other table (t2) is ~ 30M rows and it is composed by an…
Randomize
  • 8,651
  • 18
  • 78
  • 133
-1
votes
1 answer

How do I convert a geometry column from binary format to string format in a pyspark dataframe?

Here is my attempt at this: %sql SELECT df1.*,df1.geometry.STAsText() as geom_text FROM df_geo df1. This obviously fails because it is not a table, but a dataframe. How can one do this using pyspark or geospark?
Mazil_tov998
  • 396
  • 1
  • 13
-1
votes
1 answer

Pyspark: why does the ST_intersects function return duplicated rows?

I am using the ST_Intersects function of geospark to make the intersection between points and polygons. queryOverlap = """ SELECT p.ID, z.COUNTYNS as zone, p.date, timestamp, p.point FROM gpsPingTable as p, zoneShapes as z …
emax
  • 6,965
  • 19
  • 74
  • 141
1 2
3