0

I am using this quickstart guide (https://github.com/aws-quickstart/quickstart-hail) when setting up EMR with sagemaker.

Due to security requirements, I had to enable kerberos (local KDC within EMR cluster) and I referenced this guide (https://aws.amazon.com/blogs/machine-learning/securing-data-analytics-with-an-amazon-sagemaker-notebook-instance-and-kerberized-amazon-emr-cluster/) for the Kerberos set up.

Everything was working well, except that the bokeh plots cannot be saved due to access restriction. (

I tried to run ls -la / via the sagemaker notebook (via sparkmagic + livy), but the plots path /plots and /var/www/html/plots do not show and cannot be accessible.

However, when running ls -la using ssh to the master node, I am able to see these folder paths. Changing the permissions using chmod -R 777 /var/www didn't resolve this issue either.

Any idea whether there is a kerberos/livy setting that hides/protects certain file paths from kerberos authenticated users?

Reivax
  • 33
  • 2

1 Answers1

0

I found out the reason why this is happening.

When using Kerberos authentication for EMR, sparkmagic starts a spark context in the core node instead of the master node. Hence, they are 2 separate filesystems and thus I am unable to see paths created on master node but not core node

Reivax
  • 33
  • 2