2

I would like to read a file from HDFS into Spark via httpfs or Webhdfs. Something along the lines of

sc.textFile("webhdfs://myhost:14000/webhdfs/v1/path/to/file.txt")

or, ideally,

sc.textFile("httpfs://myhost:14000/webhdfs/v1/path/to/file.txt")

Is there a way to get Spark to read the file over Webhdfs/httpfs?

Brian Hess
  • 21
  • 1
  • 2

2 Answers2

0

I believe WebHDFS/ HttpFS are like streaming sources to transmit the data over REST-API.

Then Spark Streaming can be used to receive the data from the WebHDFS/ HttpFS.

Vijay Innamuri
  • 4,242
  • 7
  • 42
  • 67
-1

According to SPARK-2930 document enhancement request, spark.yarn.access.namenodes should also works for webhdfs / hdfs. SPARK-2930 clarify docs on using webhdfs with spark.yarn.access.namenodes

Running Spark on YARN Get more details about spark.yarn.access.namenodes

Shawn Guo
  • 3,169
  • 3
  • 21
  • 28