I am setting up Jupyter Notebook service in our company's Cloudera cluster now, I understand there is no built-in security in Notebook, but security is a must to most enterprise. So can anyone enlighten me what is the best security practice to Jupyter Notebook?
What I am thinking now is using Kerberos to control the data access, basically something like below:
- Create multiple service id (Unix accounts) for Jupyter Notebook users, e.g. engineering, advisory
- Create keytab for the service id
- Create data repositories for each service id and set the ownership and privilege accordingly, this way user can access their data after Kerberos authentication succeed
However, Kerberos has its drawbacks and could cause problems, for example, lifetime of keytab would fail an application unexpectedly;
And surprisingly no one like Kerberos in all the companies I worked for.
Thank you, any input is welcome.