-2

I am using Spark v2.0 and trying to read a csv file using:

spark.read.csv("filepath")

But getting the below error:

java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: Permission denied
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
  at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:171)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
  at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
  at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
  at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
  at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)
  at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50)
  at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
  at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
  at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
  at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
  at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
  at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:401)
  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:342)
  ... 48 elided
Caused by: java.lang.RuntimeException: java.io.IOException: Permission denied
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:515)
  ... 71 more
Caused by: java.io.IOException: Permission denied
  at java.io.UnixFileSystem.createFileExclusively(Native Method)
  at java.io.File.createTempFile(File.java:2024)
  at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)
  ... 71 more

I have also tried using .format("csv").csv("filepath"),but that is also giving same results.

zero323
  • 322,348
  • 103
  • 959
  • 935
Pratyush Sharma
  • 279
  • 3
  • 5

2 Answers2

2

If you look at the last part of the exception's stack trace, you realize that this error isn't about NOT having enough access to the file at "filepath".

I had a similar issue using Spark shell on my Windows client. This was the error I got

  at java.io.WinNTFileSystem.createFileExclusively(Native Method)
  at java.io.File.createTempFile(File.java:2024)
  at org.apache.hadoop.hive.ql.session.SessionState.createTempFile(SessionState.java:818)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:513)

Notice how it says WinNTFileSystem in the stack trace (while you have it as UnixFileSystem), which got me to look at this stack trace more closely. I realized the current user does not have access to create a temp file locally. More specifically, org.apache.hadoop.hive.ql.session.SessionState attempts to create a temp file in the Hive local scratch directory. And if the current user does not have enough permissions to do that, you get this error.

For me, on Windows, I realized I had to "run as administrator" the command prompt used to run Spark Shell. And that worked for me.

For you, on Unix, I guess either a sudo or updating the Hive config to set the local scratch directory, or alternatively, updating the directory security settings for the existing Hive config should do the trick.

sudheeshix
  • 1,541
  • 2
  • 17
  • 28
  • any idea of the path where the process is trying to create the TempFile? it isn't always possible to run as sudo/administrator? – ines vidal Apr 24 '18 at 12:55
  • "path where the process is trying to create the TempFile?" Check local scratch directory in Hive config hive.exec.scratchdir property in $HIVE_HOME/conf/hive-site.xml – sudheeshix Oct 06 '18 at 03:44
1

Try this code it might help

To read the data from Csv

Dataset<Row> src = sqlContext.read()
        .format("com.databricks.spark.csv")
        .option("header", "true")
        .load("Source_new.csv");`

To write the data to Csv

src.write()
        .format("com.databricks.spark.csv")
        .option("header", "true")
        .save("LowerCaseData.csv");
Nischay
  • 168
  • 2
  • 14