How to tell PySpark where the pymongo-spark package is located?

Asked May 25 '18 at 18:14

Active May 25 '18 at 18:14

Viewed 66 times

I'm creating a Data Science env't on a laptop with Ubuntu 14.04 LTS. The instructions are in Chapter 2 of Agile Data Science by Russell Jurney.

I need to configure PySpark to talk to MongoDB using the mongo-hadoop package.

So far, so good. The git repo currently resides in my home directory. PyMongo is already installed as well.

Building mongo-hadoop-spark.jar also seems to have gone well. I placed a copy in my PySpark system class path; the PySparkShell indicates the .jar is in place:

However, PySpark is still having trouble finding the package:

This is my first attempt at installing these tools, and I'm floundering. Suggestions?

asked May 25 '18 at 18:14

Brian Piercy

you had to copy the jar file inside spark_home/jars directory and not in python directory – Ramesh Maharjan May 26 '18 at 02:25
Have you tried [Getting Spark, Python, and MongoDB to work together](https://stackoverflow.com/q/33391840/8371915)? – Alper t. Turker May 26 '18 at 09:25

How to tell PySpark where the pymongo-spark package is located?

0 Answers0