I have a Spring Boot application that uses spring-yarn-boot:2.2.0.RELEASE
to get access to a Hadoop filesystem (HDFS). Operations that I do are LISTSTATUS
, GETFILESTATUS
and OPEN
(to read a file). HDFS URI is specified through application.properties:
spring.hadoop.fsUri=webhdfs://127.0.0.1:50070/webhdfs/v1/
I make a bean to which I provide Hadoop Configuration (that Spring somehow automagically prepares for me on startup):
SimplerFileSystem fs = new SimplerFileSystem(FileSystem.get(configuration));
FsShell shell = new FsShell(configuration);
And everything works well as expected, but the problems came when I got two new requirements.
First thing is that HDFS will be protected with SSL from now on. I can't seem to find any way to tell my application that the fsURI that starts with webhdfs:// is actually a https connection. And if I will give the https URL directly, I'll get an exception:
java.io.IOException: No FileSystem for scheme: https
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
... which is caused by that code: FileSystem.get(configuration).
This thing is driving me crazy, I don't seem to find a way to get pass this.
Second requirement is, that I need to authenticate myself against the WebHDFS with basic authentication. For this I also can't find any means in the client API.
Has anyone done it before and have any instructions to share? Or maybe anyone knows a different client API that I can use to accomplish this?
One option is to implement the REST calls myself with RestTemplate or any other REST service consumer API, but this looks like not-so-special use case so I'm really hoping that there is something that has been done already.
EDIT:
Found a solution to the HTTPS problem. One should use swebhdfs://
as url prefix and everything will work. Still havent found a solution to the Basic Auth problem.