1

When I start a spark-notebook using docker and create a new worksheet. The next time I start it, the worksheet isn't there.

Here's the command:

docker run -v /Users/pkerp/projects/chairliftplot/:/mnt -p 9000:9000 andypetrella/spark-notebook:0.2.0-spark-1.2.0-hadoop-1.0.4

Here's the warnings / info:

15/02/09 08:38:12 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://Remote@127.0.0.1:41602]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /127.0.0.1:41602
15/02/09 08:38:12 INFO remote.RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [akka.remote.RemoteWatcher$Heartbeat$] from Actor[akka://NotebookServer/system/remote-watcher#-457307005] to Actor[akka://NotebookServer/deadLetters] was not delivered. [8] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

Is this a misconfiguration issue or something else?

Edit:

So this problem has a few aspects to it.

  1. When the running docker container is closed using ctrl-c, it actually still exists. When I run it anew with the command above, it starts a separate new container that doesn't have the newly created notebook.

This can be mitigated by looking at the list of running containers using docker ps, finding the running one and attaching to it using docker attach process_id. The data will still be present.

  1. Using a mounted volume to store the notebooks to leads to a permissions issue. The directory mounted within the container only has owner write permissions, where the owner is user 1000. The spark-notebook is run as the user daemon (user id 1).

This is a long thread about this issue on github,, but no clear solution.

juniper-
  • 6,262
  • 10
  • 37
  • 65

1 Answers1

0

The Dockerfile will continue evolving but now at least we can backup our notebooks oustide the docker container.

This will do the trick: docker run --rm -v /Users/pkerp/projects/chairliftplot:/opt/docker/notebooks/ext -p 9000:9000 andypetrella/spark-notebook:0.6.0-scala-2.10.4-spark-1.4.1-hadoop-2.6.0

So that the folder Users/pkerp/projects/chairliftplot will contain all notebooks in the ext folder in the spark-notebook listing.

That means that:

  • all notebooks in Users/pkerp/projects/chairliftplot will be visible in the ext folder
  • all newly created notebooks in the ext folder will be available on the host folder Users/pkerp/projects/chairliftplot.

Of course you could also have used: docker run --rm -v /Users/pkerp/projects/chairliftplot:/opt/docker/notebooks -p 9000:9000 andypetrella/spark-notebook:0.6.0-scala-2.10.4-spark-1.4.1-hadoop-2.6.0 Which will discard all default notebooks and only show the content of /Users/pkerp/projects/chairliftplot. However, this way all newly created notebooks will be available in the host, no matter the folder where they have been created

Andy Petrella
  • 4,345
  • 26
  • 29