I'm creating a Data Science env't on a laptop with Ubuntu 14.04 LTS. The instructions are in Chapter 2 of Agile Data Science by Russell Jurney.
I need to configure PySpark to talk to MongoDB using the mongo-hadoop package.
So far, so good. The git repo currently resides in my home directory. PyMongo is already installed as well.
Building mongo-hadoop-spark.jar also seems to have gone well. I placed a copy in my PySpark system class path; the PySparkShell indicates the .jar is in place:
However, PySpark is still having trouble finding the package:
This is my first attempt at installing these tools, and I'm floundering. Suggestions?