Questions tagged [webhdfs]

WebHDFS is a REST API that supports the complete FileSystem interface for HDFS (Hadoop Distributed File System)

WebHDFS is a REST API that supports the complete FileSystem interface for HDFS (Hadoop Distributed File System). This Api is used to establish a connection to the Hadoop Data Lake from a third-party tool such as SSIS: Using WebHDFS to connect Hadoop Data Lake to SSIS

268 questions
2
votes
0 answers

Hadoop: namenode/dfshealth.html#tab-datanode and namenode:50070/dfshealth.html#tab-overview pages show only 2 active nodes out of 3

I have setup a fully distributed hadoop system with ubuntu. I have my host system and then 2 VirtualBox installed on that. When I execute start-dfs.sh and start-yarn.sh from master node, datanode gets started on all 3 systems. I can see that using…
2
votes
0 answers

Is there a way to pull entire directory thru webhdfs in hadoop?

We have two clusters, where our requirement is to pull data from one cluster to another. Only option available to us is, pull the data thru webhdfs!! But unfortunately, what we can see is, thru webhdfs we can only pull only one file at a time, that…
Raja
  • 513
  • 5
  • 18
2
votes
1 answer

How to use UserGroupInformation with Kerberos WebHDFS

Following is the client code on non hadoop system to perform actions on the secured remote HDFS. Configuration conf = new Configuration(); conf.set("hadoop.security.authentication",…
user608020
  • 313
  • 4
  • 15
2
votes
1 answer

CDH WebHDFS request redirects to local address on EC2

I am trying to setup an enviroment where I run some of my backend locally, and send requests to an EC2 instance from my local computer. I have CDH 4.5 setup, and it works OK. When I run the following request curl --negotiate -i -L -u:hdfs…
jamborta
  • 5,130
  • 6
  • 35
  • 55
2
votes
1 answer

No FileSystem for scheme: webhdfs

I'm building a client which pushes some data into my HDFS. Because the HDFS is inside a cluster behind a firewall I use HttpFS as a proxy to access it. The client exits with an IOException when I try to read/write to the HDFS. The message is No…
yvesonline
  • 4,609
  • 2
  • 21
  • 32
2
votes
1 answer

Webhdfs returns wrong datanode address

curl -i -X PUT "http://SomeHostname:50070/webhdfs/v1/file1?op=CREATE" HTTP/1.1 307 TEMPORARY_REDIRECT Content-Type: application/octet-stream Location: http://sslave0:50075/webhdfs/v1/file1?op=CREATE&overwrite=false Content-Length: 0 Server:…
2
votes
1 answer

WebHdfsFileSystem local ip vs network ip hadoop

have a requirement to read HDFS from a outside of the hdfs cluster. I stumbled upon WebHdfsFileSystem and even though I got the idea but I could not make it work with the network address. For example, the code below works fine as long as I use…
2
votes
1 answer

Bare minimum of dependencies to work with HDFS

I need to put some files into HDFS from my client application. I am not planning to schedule a job to hadoop, just need to drop something into HDFS. Maven dependency on hadoop-core brings a lot of stuff like jersey-core etc, which I don't need at…
jdevelop
  • 12,176
  • 10
  • 56
  • 112
1
vote
1 answer

Installing WebHDFS library in Docker failed, Error shows "krb5-config: Permission denied"

I'm trying to install apache-airflow-providers-apache-hdfs library in my Airflow-Docker 2.5.3. I've installed all the necessary Kerberos' libs, and I got the following error: #0 5.236 Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in…
Donny
  • 31
  • 4
1
vote
1 answer

how to connect hdfs in airflow?

How to perform HDFS operation in Airflow? make sure you install following python package pip install apache-airflow-providers-apache-hdfs #Code Snippet #Import packages from airflow import settings from airflow.models import Connection from…
Swapnil
  • 57
  • 1
  • 3
1
vote
0 answers

How can I setup webhdfs with ssl on AWS EMR?

Is what I am trying to do possible - setup webhdfs with ssl on AWS EMR? I am unable to find any documentation. My EMR cluster does show the URLs for namenode and datanode, and the namenode URL does use 50070 port which is default for webhdfs - is…
Ufder
  • 527
  • 4
  • 20
1
vote
1 answer

Unable to upload file or create directory via Hadoop UI

I have installed hadoop-3.2.1 in Ubuntu 18.04 with Java-8. I am able to send files to HDFS using the hadoop fs -put command via terminal. But when I try to upload files or create a directory via UI, I am getting the following errors: While Uploading…
mark86v1
  • 182
  • 1
  • 13
1
vote
1 answer

How Upload file from EFS (WinSCP) to WebHDFS (Hue/Cloudera) in PowerShell?

I've been trying to break down that problem in two parts in order to automate that: PowerShell: Transfer file from local Desktop to EFS (via WinSCP) - OK PowerShell: Get that same file on EFS (via WinSCP) and Put it into Cloudera WebHDFS (we use…
Petter_M
  • 435
  • 3
  • 10
  • 20
1
vote
0 answers

How to use hdfscli python library?

I have following use case, I wanted to connect a remote hadoop cluster. So, I got all the hadoop conf files (coresite.xml, hdfs-site.xml and others) and stored it in one directory in local file system. I got the correct keytab and krb5.conf file for…
Neil
  • 11
  • 2
1
vote
0 answers

HttpClient behavior different between .net core 3.1 and .net 5

The below code retrieves a JSON document from a WebHDFS instance using Kerberos authentication: HttpClientHandler clientHandler = new() { Credentials = CredentialCache.DefaultNetworkCredentials, DefaultProxyCredentials =…
vc 74
  • 37,131
  • 7
  • 73
  • 89