I have several notebooks which are ran by a "driver" notebook using papermill. These notebooks use the scrapbook library to communicate information to the driver. The driver then passes this information as parameters to other notebooks. I want to use EMR Notebooks to optimize the execution efficiency of this "notebook pipeline". Does AWS EMR Notebooks support scrapbook and papermill or will I need to refactor my notebooks?
Asked
Active
Viewed 588 times
1 Answers
1
As of now, nope. You can't do that directly. What you can do though (what we are doing) is as follows :
- Create a python environment on your EMR masternode using the
hadoop
user - Install sparkmagic in your environment and configure all kernels as described in the README.md file for sparkmagic
- Copy your notebook to master node/use it directly from s3 location
Install papermill and run with papermill :
papermill s3://path/to/notebook/input.ipynb s3://path/to/notebook/output.ipynb -p param=1

anakin
- 337
- 2
- 7