0

I followed the tutorial for setting up JupyterHub on an AWS EMR cluster at this link: https://aws.amazon.com/blogs/big-data/running-jupyter-notebook-and-jupyterhub-on-amazon-emr/

I got the cluster up and running, but now my question is how do I stress/load test? (i.e. simulate 100 users running through the notebooks simultaneously).

In a classroom setting, I had about 30 users sshed into my cluster running through the notebook exercises, but there was a huge slowdown when more people started executing the code blocks in the notebooks. What happened was some python library imports took forever, some exercises stopped working or was just hanging. Cloudwatch showed that there was a network bottleneck.

Basically what I'm asking is how can I go about debugging something like that? What's the best way to simulate multiple users sshing into the EMR cluster, opening up jupyter notebooks and running the code blocks concurrently?

noobprogrammer
  • 345
  • 1
  • 6
  • 20

1 Answers1

1

You should look (and contribute?) to project like this one which are meant to load-test JupyterHub and should migrate to jupyterHub organisation once more polished.

Note that in your case you are not really wishing to test JupyterHub, you are testing your cluster; just run N scripts in parallel importing your library and you have your load test.

Matt
  • 27,170
  • 6
  • 80
  • 74