In order to reduce the time for provisioning, we've decided to keep up a dedicated EMR cluster with 5 instances (we expect to need about 5). In case we need more, we think we'll need to implement some sort of autoscaling.
I'm not familiar at all with EMR- does it support autoscaling? I found this in the docs: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-manage-resize.html
Is that the correct place to look for autoscaling or am I misunderstanding what they mean by "resize". I've read that one benefit of EMR is the "on demand processing" and I think that it splits the load between ec2 instances without you specifying how many instances so this gives me the impression that it does the scaling of ec2 instances on its own, meaning we don't need to autoscale ourselves. Am I misunderstanding what "on demand processing" means?
If the resizing link I provided is appropriate for what I'm trying to do, does anyone have experience with determining when to resize? The doc only describes how but not, for example, how to have an alarm for when to resize. I've used their regular autoscaling service and it allows you to resize based on certain conditions but I'm not seeing this here.
I'm still unsure if autoscaling EMR is a bad idea- is it too involved (since there are entire companies like Qubole that provide this) or maybe not very useful since EMR already uses whatever computing power it needs? I don't know very much about what EMR actually provides so maybe that's why I'm confused.