0

I have got some existing iPython code using shapely for matching points within polygons:

Now, I'm attempting to port the code to pyspark (version 1.4) on Bluemix

Running the line below fails:

!pip install --user shapely

The error message reads:

Collecting shapely
  Using cached Shapely-1.5.13.tar.gz
    Complete output from command python setup.py egg_info:
    Failed `CDLL(libgeos_c.so.1)`
    Failed `CDLL(libgeos_c.so)`
    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "/tmp/pip-build-ylMKmC/shapely/setup.py", line 38, in <module>
        from shapely._buildcfg import geos_version_string, geos_version, \
      File "shapely/_buildcfg.py", line 167, in <module>
        fallbacks=['libgeos_c.so.1', 'libgeos_c.so'])
      File "shapely/_buildcfg.py", line 161, in load_dll
        libname, fallbacks or []))
    OSError: Could not find library geos_c or load any of its variants ['libgeos_c.so.1', 'libgeos_c.so']

Apparently there is a dependency to the Geos C library. Now, I don't know if installing Geos C is possible or the way forward here.

So my real question is, what is the best approach for performing point in polygon matching on geospatial data in pySpark? Any experiences?

Thank you

/Henrik

zero323
  • 322,348
  • 103
  • 959
  • 935

1 Answers1

1

this is how you can get geos libraries in the right path for shapely installation:

wget http://download.osgeo.org/geos/geos-3.5.0.tar.bz2

tar jxf geos-3.5.0.tar.bz2

cd geos-3.5.0 && ./configure --prefix=$HOME/geos-bin && make && make install

sudo cp /home/hadoop/geos-bin/lib/* /usr/lib

sudo /bin/sh -c 'echo "/usr/lib" >> /etc/ld.so.conf'

sudo /bin/sh -c 'echo "/usr/lib/local" >> /etc/ld.so.conf'

sudo /sbin/ldconfig

sudo /bin/sh -c 'echo -e "\nexport LD_LIBRARY_PATH=/usr/lib" >> /home/hadoop/.bashrc'

source /home/hadoop/.bashrc
Hussain Bohra
  • 985
  • 9
  • 15