2

I'm working in an environment where, due to recent developments in auditing requirements, it has become necessary to log all Jupyter Notebook inputs when a user is accessing data, i.e. create an audit trail. The minimum requirement is to log all instances where a user reads, writes or displays data. Currently the audit trail is solid for the database, but stops there as each user can access the database locally and read, write or manipulate the data in a notebook.

Standard logging facilities for Python seem suited for logging for the purposes of debugging rather than auditing. IPython does include a logging facility for logging all user input (through, for instance, the magic %logstart), but the user has direct control over the logging and may escape it (%logstop).

One option I've explored is the use of JupyterHub, where using an IPython script on start-up would ensure that logging is initiated. However, the user may still stop the logging at any time.

Is there a way to prevent this or to otherwise log all Jupyter notebook inputs when accessing data?

soolo
  • 31
  • 3
  • What do you mean by "each user can access the database locally"? If you're logging access to the database, then is that not everything you need? – Oliver Charlesworth May 31 '18 at 08:18
  • We log reading from and writing to the database, but we need to be able to log reading, writing, displaying or manipulating the data in a notebook as well to have a complete audit trail in case of data breaches etc. – soolo May 31 '18 at 08:22
  • @soolo What is special about the notebook? What if the user runs the same code without the notebook? What if they save data to a file and then open it with another program? Do you need to log that too? – Stop harming Monica May 31 '18 at 08:48
  • @Goyo the database side log reports the source of the data request, which in our case is always a request made through a notebook kernel. Although technically other data requests are possible, they are not allowed so creating an audit trail for them is unnecessary. – soolo May 31 '18 at 09:13
  • @soolo What do you mean "not allowed"? Are users unable to run other programs? They must be able to run a browser at a minimum. Also you say "log all Jupyter notebook inputs " but it looks like you need to log outputs too. – Stop harming Monica May 31 '18 at 09:25
  • @soolo You mean "log everything the kernel runs"? – Stop harming Monica May 31 '18 at 09:27
  • @Goyo "not allowed" as in it should not happen, and if it does happen the user loses the juridical protection that we aim to provide with the audit trail. We do not need to make it impossible to access data in other ways (e.g. directly through console), we just need to ensure that a) all read/write to database is logged according to how it was accessed (done), b) the user can opt-in to an audit trail by accessing the database through a notebook, c) that the user does not have the permissions to manipulate how this audit trail works, i.e. how notebook activity is logged. – soolo May 31 '18 at 09:55

0 Answers0