1

I am running an EMR cluster with Spark/Livy, and would like to test Spark Structured Streaming. I am using the Jupyter Notebook managed service (connects via Livy) however when I try this code in Jupyter:

query = (wordCounts
.writeStream
.queryName("streamingDF")
.outputMode('complete')
.format('memory')
.start())

I receive the following error:

An error occurred while calling o98.start. : org.apache.hadoop.security.AccessControlException: Permission denied: user=livy, access=WRITE, inode="/mnt/tmp":hadoop:hadoop:drwxr-xr-x

How, and to what do I change the permission as Livy seems to be writing temp data to HDFS. I thought with the 'memory' option it writes to the driver and not too disk.

Tex
  • 89
  • 2
  • 10

1 Answers1

3

You have to ssh into the master node and run sudo usermod -a -G hdfsadmingroup livy. By default the "livy" user created for the jupyter notebook in AWS has no write permissions to hdfs.

I'm probably way to late to help the original author, but hopefully this will save future devs some time.