I have a program that takes a dataframe and should save it into Elasticsearch. Here's what it looks like when I save the dataframe:
model_df.write.format(
"org.elasticsearch.spark.sql"
).option(
"pushdown", True
).option(
"es.nodes", "example.server:9200"
).option("es.index.auto.create", True
).mode('append').save("EPTestIndex/")
When I run my program, I get this error:
py4j.protocol.Py4JJavaError: An error occurred while calling o96.save. : java.lang.ClassNotFoundException: Failed to find data source: org.elasticsearch.spark.sql. Please find packages at http://spark.apache.org/third-party-projects.html
I did some research and thought I needed a jar, so I added these configurations to my SparkSession
:
spark = SparkSession.builder.config("jars", "/Users/public/ProjectDirectory/lib/elasticsearch-spark-20_2.11-6.0.1.jar")\
.getOrCreate()
sqlContext = SQLContext(spark)
I initialize the SparkSession
in main and write to ES
in another package. The package takes the dataframe and runs the write command above. However, even with this I am still getting the same ClassNotFoundExceptioin
What might be the issue?
I am running this program in PyCharm, how can I make it so that PyCharm is able to run it?