When I start a spark-notebook using docker and create a new worksheet. The next time I start it, the worksheet isn't there.
Here's the command:
docker run -v /Users/pkerp/projects/chairliftplot/:/mnt -p 9000:9000 andypetrella/spark-notebook:0.2.0-spark-1.2.0-hadoop-1.0.4
Here's the warnings / info:
15/02/09 08:38:12 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://Remote@127.0.0.1:41602]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /127.0.0.1:41602
15/02/09 08:38:12 INFO remote.RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [akka.remote.RemoteWatcher$Heartbeat$] from Actor[akka://NotebookServer/system/remote-watcher#-457307005] to Actor[akka://NotebookServer/deadLetters] was not delivered. [8] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
Is this a misconfiguration issue or something else?
Edit:
So this problem has a few aspects to it.
- When the running docker container is closed using ctrl-c, it actually still exists. When I run it anew with the command above, it starts a separate new container that doesn't have the newly created notebook.
This can be mitigated by looking at the list of running containers using docker ps
, finding the running one and attaching to it using docker attach process_id
. The data will still be present.
- Using a mounted volume to store the notebooks to leads to a permissions issue. The directory mounted within the container only has owner write permissions, where the owner is user 1000. The spark-notebook is run as the user daemon (user id 1).
This is a long thread about this issue on github,, but no clear solution.