0

I have this scala and Java code running in Spark on Cloudera platform whose simple task is to perform Word count on the files in HDFS. My question is : What's the difference in reading the file with this code snippet -

sc.textFile("hdfs://quickstart.cloudera:8020/user/spark/InputFile/inputText.txt")

as opposed to reading from local drive over cloudera platform?

sc.textFile("/home/cloudera/InputFile/inputText.txt")

Is it not that in both cases the file is saved using HDFS and wouldn't make any difference reading/ writing either ways? These both read/write to HDFS, right? I referred this thread, but no clue. Cloudera Quickstart VM illegalArguementException: Wrong FS: hdfs: expected: file:

Could you please tell me at least a single case where using hdfs:// implies something else?

Thank You!

Community
  • 1
  • 1
Ashwini
  • 41
  • 8

1 Answers1

0

As per my knowledge,

  • sc.textFile("hdfs://quickstart.cloudera:8020/user/spark/InputFile/inputText.txt") in this line hdfs://quickstart.cloudera:8020 refers to HDFS directory or file /user/spark/InputFile/inputText.txt.
  • sc.textFile("/home/cloudera/InputFile/inputText.txt") in this line '/home/cloudera/InputFile/inputText.txt' refers to your local unix/linux file system.

So if you want to use/read/write into HDFS file then you need to use hdfs://namenodeHost:port as per hadoop configuration.

Hope this clarify your doubt !!

Sagar Bhalodiya
  • 412
  • 5
  • 8