0

I am setting up Jupyter Notebook service in our company's Cloudera cluster now, I understand there is no built-in security in Notebook, but security is a must to most enterprise. So can anyone enlighten me what is the best security practice to Jupyter Notebook?

What I am thinking now is using Kerberos to control the data access, basically something like below:

  1. Create multiple service id (Unix accounts) for Jupyter Notebook users, e.g. engineering, advisory
  2. Create keytab for the service id
  3. Create data repositories for each service id and set the ownership and privilege accordingly, this way user can access their data after Kerberos authentication succeed

However, Kerberos has its drawbacks and could cause problems, for example, lifetime of keytab would fail an application unexpectedly;

And surprisingly no one like Kerberos in all the companies I worked for.

Thank you, any input is welcome.

Choix
  • 555
  • 1
  • 12
  • 28
  • 1
    _"No one likes Kerberos"_ > everyone **fears** Kerberos. It is an old and tricky technology, Java support of Kerberos is kind of lame, Hadoop added its own lame workarounds on top of Java... And Kerberos was never designed for distributed systems to begin with. Read the GitBook named _Hadoop and Kerberos, the Madness beyond the Gate_ by Steve Loughran (HortonWorks) – Samson Scharfrichter Aug 10 '18 at 11:25
  • 1
    On the other hand, if you can use Microsoft Active Directory for authentication (Kerberos) & authorization (LDAP) on Linux, then life becomes easier for all users (including your Chief Security Officer)... – Samson Scharfrichter Aug 10 '18 at 11:28
  • 1
    Ideally, the service would check user identities (based on Kerberos creds) and forward these creds to Hadoop services. Except that "double hop" is usually forbidden because a compromised service could use stolen credentials to hop anywhere... The workaround used by Hue (and Oozie for different reasons) is to run a privileged "proxy" account that can impersonate any base user against core Hadoop services (using auth tokens). But since your session does not have pure Kerberos creds you can't tap non-core services (ZK, HBase) from the session. – Samson Scharfrichter Aug 10 '18 at 11:33
  • Thank you for your comment, Samson. Just hit the first roadblocker here: https://stackoverflow.com/questions/51791486/klist-error-bad-format-in-credentials-cache, would appreciate if you can help there. TIA – Choix Aug 10 '18 at 17:37

0 Answers0