0

I'm creating a Data Science env't on a laptop with Ubuntu 14.04 LTS. The instructions are in Chapter 2 of Agile Data Science by Russell Jurney.

I need to configure PySpark to talk to MongoDB using the mongo-hadoop package.

So far, so good. The git repo currently resides in my home directory. PyMongo is already installed as well.

mongo-hadoop installation:

Building mongo-hadoop-spark.jar also seems to have gone well. I placed a copy in my PySpark system class path; the PySparkShell indicates the .jar is in place:

mongo-hadoop-spark jar file

However, PySpark is still having trouble finding the package:

PySpark session

This is my first attempt at installing these tools, and I'm floundering. Suggestions?

Brian Piercy
  • 631
  • 1
  • 7
  • 22

0 Answers0