Connect from Spark-JobServer (local instance) to Hadoop

Question

I run a virtual machine with a local instance of Hadoop and of Spark-JobServer on it. I created a file named 'test.txt' on HDFS that I want to open from the Spark-JobServer. I wrote the following code to do this:

val test1 = sc.textFile("hdfs://quickstart.cloudera:8020/test.txt")
val test2 = test1.count
return test2

However, when I want to run these lines, I get an error in the Spark-JobServer:

"Input path does not exist: hdfs://quickstart.cloudera:8020/test.txt"

I looked up the path to HDFS with hdfs getconf -confKey fs.defaultFS and it showed me hdfs://quickstart.cloudera:8020 as path. Why can I not access the test.txt file if this is the correct path to HDFS? If this is the inccorect path, how can I find the correct path?

Most likely you have done a hadoop -put, but have you done it in the root dir of HDFS? is the file readable by the user running the spark job? — Havnar, Jan 25 '16 at 13:55
The -put was as follows: `hadoop fs -put 'test.txt'` so it is the home directory. What do you mean by "is the file readable"? It is definitely readable from the "local-local" spark-shell on my computer. — Jan Janiszewski, Jan 25 '16 at 14:03

score 2 · Accepted Answer · answered Jan 25 '16 at 14:06

Your file is not in the root directory.

You will find your file under hdfs:///user/<your username>/test.txt

When you do a hadoop -put without specifying a location, it will go in your user's home dir, not in the root dir.

check the output of the following to verify this:

hadoop fs -cat test.txt 
hadoop fs -cat /test.txt

do hadoop -put 'test.txt' /

and see if your spark code works.

Connect from Spark-JobServer (local instance) to Hadoop

1 Answers1