0

I'm using AWS SageMaker connected to an EMR cluster via Livy, with a "normal" session(default session config) the connection is created, and spark context works fine. but when adding

spark.pyspark.python":"./ANACONDA/env_name/bin/python3",
"spark.yarn.dist.archives":"s3://<path>/env_name.tar.gz#ANACONDA"

The session is not created and an error is thrown:

Neither SparkSession nor HiveContext/SqlContext is available

If I remove the spark.pyspark.python line, it takes some time(because it is distributing the .tar.gz file to executors) but it works, session and spark context arre created(but I cannot use the environment in the .tar.gz), so I guess it has something to do with spark.pyspark.python

Given that context: I'm trying to debug what's happening and for that, I want to check the Livy logs, but I cannot find them, I know they should be in S3 https://aws.amazon.com/premiumsupport/knowledge-center/spark-driver-logs-emr-cluster/ but I cannot find them anywhere, can anyone guide me to the logs location? or any idea on how to debug the issue?

Luis Leal
  • 3,388
  • 5
  • 26
  • 49
  • Ever find a solution? Running into this same issue on AWS SageMaker with EMR – sammy Oct 03 '22 at 18:01
  • Yep, there were lots of changes(even a newer recent cluster creation) and I'm not sure what solved this, but I have the hipothesis that the solution was adding the following property to the %%configure -f section in the notebook: "livy.rsc.server.connect.timeout":"600s", the bigger the conda environment, the bigger that value should be – Luis Leal Oct 06 '22 at 20:59
  • Thanks for the follow-up. Configuration was also the issue for me. I specified a bad path to python `"spark.pyspark.python": "./environment/bin/python"`, whereas it should have been `"spark.pysparkpython": "python3"`. I could have also excluded this config param entirely since python3 is currently the default for my environment. – sammy Oct 07 '22 at 17:56

0 Answers0