Scheduling for Spark jobs on Bluemix

Question

I'm trying to run my Spark application on Bluemix by schedule. For now I'm using scheduling of spark-submit.sh script locally on my machine. But I'd like to use Bluemix for this purpose. Is there any way to set scheduling directly inside Bluemix infrastructure for running Spark notebooks or Spark applications?

score 3 · Answer 1 · edited Jun 20 '20 at 09:12

The Bluemix OpenWhisk offering provides an easy way to schedule actions run on a periodic schedule similar to cron jobs.

Overview of OpenWhisk-based solution

OpenWhisk provides a programming model based actions, triggers, and rules. For this use case, you would

Create an action that kicks off your spark job.
Use the /whisk.system/alarms package to arrange for triggers to arrive periodically according to your schedule.
Create a rule that declares that your action should fire whenever a trigger event occurs.

Your action can be coded in javascript if it's easy to kick off your job from a javascript function. If not, and you'd like your action to be implemented by a shell script, you can use whisk docker actions to manage your shell script as an action.

Using the whisk.system/alarms package to generate events on a schedule.

This page in the whisk docs includes a detailed description of how to accomplish this. Briefly:

The /whisk.system/alarms/alarm feed configures the Alarm service to fire a trigger event at a specified frequency. The parameters are as follows:

cron: A string, based on the Unix crontab syntax, that indicates when to fire the trigger in Coordinated Universal Time (UTC). The string is a sequence of six fields separated by spaces: X X X X X X. For more details on using cron syntax, see: https://github.com/ncb000gt/node-cron. Here are some examples of the frequency indicated by the string:
    * * * * * *: every second.
    0 * * * * *: top of every minute.
    * 0 * * * *: top of every hour.
    0 0 9 8 * *: at 9:00:00AM (UTC) on the eighth day of every month

trigger_payload: The value of this parameter becomes the content of the trigger every time the trigger is fired.

maxTriggers: Stop firing triggers when this limit is reached. Defaults to 1000.

Here is an example of creating a trigger that will be fired once every 20 seconds with name and place values in the trigger event.

$ wsk trigger create periodic --feed /whisk.system/alarms/alarm --param cron '*/20 * * * * *' --param trigger_payload '{"name":"Odin","place":"Asgard"}'

Each generated event will include as parameters the properties specified in the trigger_payload value. In this case, each trigger event will have parameters name=Odin and place=Asgard.

Can you, please, elaborate a bit more how submission of Spark jobs can be accomplished using this method? for example, what would be the best way to acquire credentials for a particular Spark service instance, to design the action that invokes spark-submit, etc? @stephen-fink — Alex Glikson, Aug 01 '16 at 07:33

Scheduling for Spark jobs on Bluemix

1 Answers1

Overview of OpenWhisk-based solution

Using the whisk.system/alarms package to generate events on a schedule.