The code works fine with the pyspark
shell but when I'm trying to write a program in Java or Scala, I'm getting exceptions.
What is the best way to store spark dataframe to MongoDB using python?
- pyspark version- 2.2.0
- MongoDB version- 3.4
- Python 2.7
- Java - jdk-9
Here is my code:
from pyspark import SparkContext
from pyspark.sql import SparkSession
my_spark = SparkSession \
.builder \
.appName("myApp") \
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.coll") \
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.coll") \
.getOrCreate()
dataframe = my_spark.read.csv('auto-data.csv', header=True)
dataframe.write.format("com.mongodb.spark.sql.DefaultSource") \
.mode("append").option("database", "auto").option("collection", "autod").save()
and the snapshot of my csv data.
and the errors:
I tried after installing mongo-spark
library from github, yet getting the same result.