I have written the following pyspark code.
from pyspark.sql import SparkSession
import sys
import sklearn
spark = SparkSession.builder.getOrCreate()
sc = spark.sparkContext
print (sys.version_info)
When I run with:
spark-submit --master yarn --deploy-mode client test.py
it executes correctly. However, when I change --deploy-mode to the "cluster", i.e.:
spark-submit --master yarn --deploy-mode cluster test.py
I see the following error. I have no idea why this happens and how can I resolve it.
ImportError: No module named sklearn
I have seen this post. But it did not help me.