I use AWS EMR for our spark streaming. I add a step in EMR that reads data from Kinesis stream. What I need is an approach to stop this step and add a new one.
Right now I spawn a thread from the Spark driver and listen to an SQS queue for a message and upon receiving a message, I call sparkContext.stop()
. I use Chef for our deployment automation. So when there is a new artifact, a message is put into SQS, EMR reads it and stops the step. Chef then adds a new step with EMR API.
My question is, is this the right way to stop a long running streaming job in EMR? How will this be handled had spark been deployed on a standalone cluster instead of EMR?