Spark.read.csv Error: java.io.IOException: Permission Denied

Question

I am using Spark v2.0 and trying to read a csv file using:

spark.read.csv("filepath")

But getting the below error:

java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
  at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:171)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
  at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
  at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
  at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
  at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)
  at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50)
  at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
  at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
  at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
  at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
  at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
  at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:401)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:342)
  ... 48 elided
Caused by: java.lang.RuntimeException: java.io.IOException: Permission denied
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:515)
  ... 71 more
Caused by: java.io.IOException: Permission denied
  at java.io.UnixFileSystem.createFileExclusively(Native Method)
  at java.io.File.createTempFile(File.java:2024)
  at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)
  ... 71 more

I have also tried using .format("csv").csv("filepath"),but that is also giving same results.

Hi Bhavesh, file path has following permissions: -rwxr-xr-x 3 pratyush04 hdfs — Pratyush Sharma, Feb 08 '17 at 12:20

score 2 · Answer 1 · answered Feb 14 '17 at 08:25

If you look at the last part of the exception's stack trace, you realize that this error isn't about NOT having enough access to the file at "filepath".

I had a similar issue using Spark shell on my Windows client. This was the error I got

  at java.io.WinNTFileSystem.createFileExclusively(Native Method)
  at java.io.File.createTempFile(File.java:2024)
  at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)

Notice how it says WinNTFileSystem in the stack trace (while you have it as UnixFileSystem), which got me to look at this stack trace more closely. I realized the current user does not have access to create a temp file locally. More specifically, org.apache.hadoop.hive.ql.session.SessionState attempts to create a temp file in the Hive local scratch directory. And if the current user does not have enough permissions to do that, you get this error.

For me, on Windows, I realized I had to "run as administrator" the command prompt used to run Spark Shell. And that worked for me.

For you, on Unix, I guess either a sudo or updating the Hive config to set the local scratch directory, or alternatively, updating the directory security settings for the existing Hive config should do the trick.

any idea of the path where the process is trying to create the TempFile? it isn't always possible to run as sudo/administrator? — ines vidal, Apr 24 '18 at 12:55
"path where the process is trying to create the TempFile?" Check local scratch directory in Hive config hive.exec.scratchdir property in $HIVE_HOME/conf/hive-site.xml — sudheeshix, Oct 06 '18 at 03:44

score 1 · Answer 2 · answered Feb 14 '17 at 10:13

Try this code it might help

To read the data from Csv

Dataset<Row> src = sqlContext.read()
        .format("com.databricks.spark.csv")
        .option("header", "true")
        .load("Source_new.csv");`

To write the data to Csv

src.write()
        .format("com.databricks.spark.csv")
        .option("header", "true")
        .save("LowerCaseData.csv");

Spark.read.csv Error: java.io.IOException: Permission Denied

2 Answers2