0

I am trying to use the Hadoop GIS Framework, in order to add Spatial support to hive. One of the things I want to do is to create a spatial table from external data (from PostGIS). Unfortunately, the serializer provided by ESRI maps to a ESRI JSON format, rather than standards such as WKT, GeoJSON. What I ended up doing, was a bit of a workaround.

The first thing, was to export my PostGIS data as a tab separated file, transforming the geometric field into GeoJSON.

\COPY (select id, ST_AsGeoJSON(geom) from grid_10) TO '/tmp/grid_10.geojson';

Then I put it somewhere in the S3 filesystem, and loaded it using the csv serializer. It created a table with two fields: and integer, and text (containing GeoJSON).

CREATE EXTERNAL TABLE grid_10 (id bigint, json STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 's3://some-url/data/grids/geojson';

enter image description here

I can generate geometry correctly from this GeoJSON, using this query:

SELECT ID, ST_AsText(ST_GeomFromGeoJSON(json)) from grid_10 limit 3;

Which outputs:

enter image description here

Now I wanted to convert this table into an actual spatial table, where geometry is stored as a BLOB, rather than some text. I did it with the following query:

create table new_grid as SELECT ID, ST_GeomFromGeoJSON(json) as geom from grid_10;  

To my surprise, the content of this table is the same geometry, repeated over and over.

enter image description here

I tried the same approach - creating a geometry from a WKT/GeoJSON and writing it into a table - with the same results. Is this a bug? Does it mean, I am condemned to perform spatial queries using conversions-on-the-fly, and by the way isn't it much costly in computational terms than manipulating BLOBs?

create table grid_cnt as 
SELECT grid_10.id, count(grid_10.id) as ptcnt FROM grid_10 JOIN tweets WHERE     ST_Contains(ST_GeomFromGeoJSON(grid_10.json),ST_Point(tweets.longitude, tweets.latitude))=true GROUP BY grid_10.id;

I was wondering if anybody has experienced the same issues.

Update: This problem was happening with Hive 0.11, running on Amazon Hadoop's Distribution 3.3.1. I was also pulling the ESRI jars, from this link:

https://github.com/Esri/gis-tools-for-hadoop/archive/master.zip

When I switched to the jar 2.0, and the latest hive (0.13), the problem disappeared.

You can find my issue report here. Hope this helps someone experiencing the same issues.

doublebyte
  • 1,225
  • 3
  • 13
  • 22

1 Answers1

0

I went through same issues described by you above...The solution from some expert that I got was to stored your geometry information in wkt i.e. text format instead of geometry format which you have tried.

  • Yes, this is what I am doing, but I thought it would be computationally inefficient to do the geometric instantiations on the fly... is this a bug?? – doublebyte Nov 27 '14 at 14:59
  • I don't know weather it is a bug or not but...wkt can also be directly analyzed using the spatial analysis function so at the same time we don't require the geometric instantiations. – Ranjeet Paswan Nov 28 '14 at 14:30
  • I see. In that case, wouldn't it be better to not provide the binary representations at all? I think that it is better to not provide something, specially if it is not essential, than to provide something misleading/erroneous... – doublebyte Dec 01 '14 at 08:11
  • 1
    same case i struggled for 1 week...i created binary type for geometries in postgis loaded the same into hcataloge with binary data type ..but esri function didn't worked for it...came to the same zero position..!! – Ranjeet Paswan Dec 01 '14 at 13:02
  • What version of Hive are you using? – Mike Park Dec 03 '14 at 21:39
  • my current version hive is 0.13 – Ranjeet Paswan Dec 04 '14 at 10:21