0

I have a HIVE table called "favoriteshop"(shopname, wkt) with 10 locations and its wkt(Well-known text). I also have another table called "city"(cityname, wkt) with all cities and the city's full wkt. I want to do spatial joins to the 2 tables to see if they spatially intersects each other. Below is my query:

SELECT a.shopname, a.wkt, b.cityname
FROM favoriteshop a, city b
WHERE ST_Intersects(ST_GeomFromText(a.wkt), ST_GeomFromText(b.wkt)) = true

Is there a more efficient way of doing this? it feels like it is a full table scan on city table which is problematic as city can be huge(and let's pretend it can be over millions or billions of record). Thanks for your suggestions!

R.C
  • 573
  • 3
  • 7
  • 19
  • See also https://stackoverflow.com/questions/38963487/how-to-optimize-scan-of-1-huge-file-table-in-hive-to-confirm-check-if-lat-long – Randall Whitman Mar 19 '18 at 16:44

1 Answers1

0

calculate ST_GeomFromText in the subqueries and move condition to the ON clause:

SELECT a.shopname, a.wkt, b.cityname
  FROM ( select a.shopname, a.wkt, ST_GeomFromText(a.wkt) Geom from favoriteshop a ) a
     INNER JOIN 
       (selectb.cityname, ST_GeomFromText(b.wkt) Geom from  city b ) b
               on ST_Intersects(a.Geom, b.Geom) = true;
leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Thanks for your suggestion, may I ask what the rationale behind moving it to the ON clause is? how does that differs from doing what I did in the original question? Thanks! – R.C Mar 19 '18 at 18:26
  • @R.C I noticed that the optimizer often transforms such queries with where to the cross join + where after join(on the next stage). and with the ON clause it will be transformed to the map join. Also the reason is that it is more readable, ANSI Join. – leftjoin Mar 19 '18 at 19:16