0

My question is exactly same as this How to get path to the uploaded file

But when I try it, I get different results.

I can see my file uploaded into staging directory in log as

20/05/04 15:30:17 INFO Client: Uploading resource file:/home1/irteam/fileName.txt -> hdfs://aaa.aaa.aaa:8020/user/irteam/.sparkStaging/application_1554781627650_743169/fileName.txt

But when I try get it with

spark.read.text(SparkFiles.get('fileName.txt'))

I get error as

Input path does not exist: hdfs://aaa.aaa.aaa:8020/tmp/spark-d5854059-2389-4623-a5ce-431789d81bd3/ ...

That is not the staging directory. How can I get it?

Thank you for reading my question.

zenyatta
  • 97
  • 2
  • 9

1 Answers1

0

Once the file is uploaded to the tmp directory. Spark can access it like it is on local env so in order to read the file you can simply try:

spark.read.text('filename.txt')
#incase of csv file
spark.read.csv('filename.csv')

Hope it helps.

Shubham Jain
  • 5,327
  • 2
  • 15
  • 38
  • 1
    Sorry, I should have told, that my spark is running on cluster. In local, that would be great. – zenyatta May 04 '20 at 07:00
  • Then you can directly read the file spark.read.text('/home1/irteam/fileName.txt') by providing your complete local file path – Shubham Jain May 04 '20 at 07:06
  • I add it as --files /home1/irteam/fileName.txt, then I call it spark.read.text('/home1/irteam/fileName.txt'), but java.io.FileNotFoundException. file:/home1/irteam/fileName.txt does not exist. – zenyatta May 04 '20 at 07:33