I have got some existing iPython code using shapely for matching points within polygons:
Now, I'm attempting to port the code to pyspark (version 1.4) on Bluemix
Running the line below fails:
!pip install --user shapely
The error message reads:
Collecting shapely
Using cached Shapely-1.5.13.tar.gz
Complete output from command python setup.py egg_info:
Failed `CDLL(libgeos_c.so.1)`
Failed `CDLL(libgeos_c.so)`
Traceback (most recent call last):
File "<string>", line 20, in <module>
File "/tmp/pip-build-ylMKmC/shapely/setup.py", line 38, in <module>
from shapely._buildcfg import geos_version_string, geos_version, \
File "shapely/_buildcfg.py", line 167, in <module>
fallbacks=['libgeos_c.so.1', 'libgeos_c.so'])
File "shapely/_buildcfg.py", line 161, in load_dll
libname, fallbacks or []))
OSError: Could not find library geos_c or load any of its variants ['libgeos_c.so.1', 'libgeos_c.so']
Apparently there is a dependency to the Geos C library. Now, I don't know if installing Geos C is possible or the way forward here.
So my real question is, what is the best approach for performing point in polygon matching on geospatial data in pySpark? Any experiences?
Thank you
/Henrik