I have a standalone Flink installation on top of which I want to run a streaming job that is writing data into a HDFS installation. The HDFS installation is part of a Cloudera deployment and requires Kerberos authentication in order to read and write the HDFS. Since I found no documentation on how to make Flink connect with a Kerberos-protected HDFS I had to make some educated guesses about the procedure. Here is what I did so far:
- I created a keytab file for my user.
In my Flink job, I added the following code:
UserGroupInformation.loginUserFromKeytab("myusername", "/path/to/keytab");
Finally I am using a
TextOutputFormat
to write data to the HDFS.
When I run the job, I'm getting the following error:
org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBE
ROS]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1730)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1668)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1593)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397)
at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:405)
For some odd reason, Flink seems to try SIMPLE authentication, even though I called loginUserFromKeytab
. I found another similar issue on Stackoverflow (Error with Kerberos authentication when executing Flink example code on YARN cluster (Cloudera)) which had an answer explaining that:
Standalone Flink currently only supports accessing Kerberos secured HDFS if the user is authenticated on all worker nodes.
That may mean that I have to do some authentication at the OS level e.g. with kinit
. Since my knowledge of Kerberos is very limited I have no idea how I would do it. Also I would like to understand how the program running after kinit
actually knows which Kerberos ticket to pick from the local cache when there is no configuration whatsoever regarding this.