Cloudera Hadoop : File reading/ writing in HDFS

Question

I have this scala and Java code running in Spark on Cloudera platform whose simple task is to perform Word count on the files in HDFS. My question is : What's the difference in reading the file with this code snippet -

sc.textFile("hdfs://quickstart.cloudera:8020/user/spark/InputFile/inputText.txt")

as opposed to reading from local drive over cloudera platform?

sc.textFile("/home/cloudera/InputFile/inputText.txt")

Is it not that in both cases the file is saved using HDFS and wouldn't make any difference reading/ writing either ways? These both read/write to HDFS, right? I referred this thread, but no clue. Cloudera Quickstart VM illegalArguementException: Wrong FS: hdfs: expected: file:

Could you please tell me at least a single case where using hdfs:// implies something else?

Thank You!

score 0 · Accepted Answer · answered Jan 10 '17 at 13:33

As per my knowledge,

sc.textFile("hdfs://quickstart.cloudera:8020/user/spark/InputFile/inputText.txt") in this line hdfs://quickstart.cloudera:8020 refers to HDFS directory or file /user/spark/InputFile/inputText.txt.
sc.textFile("/home/cloudera/InputFile/inputText.txt") in this line '/home/cloudera/InputFile/inputText.txt' refers to your local unix/linux file system.

So if you want to use/read/write into HDFS file then you need to use hdfs://namenodeHost:port as per hadoop configuration.

Hope this clarify your doubt !!

Cloudera Hadoop : File reading/ writing in HDFS

1 Answers1