1

SCDF Composed Task Runner gives us the option to turn on the --increment-instance-enabled. This option creates an artificial run.id parameter, which increments for every run. Therefore the task is unique for Spring Batch and will restart.

The problem with the IdIncrementer is when I mix it with execution without the IdIncrementer. In the event when a task does not finish, I want to resume the Task. The problem I encountered was when the task finishes without the IdIncrementer, I could not start the task again with the IdIncrementer.

I was wondering what would be the best way to restart with the option to resume?

My idea would be to create a new IdResumer, which uses the same run.id as the last execution.

We are run SCDF 2.2.1 on Openshift v3.11.98 and we use CTR 2.1.1.

The steps to reproduce this:

  1. Create a new SCDF Task Definition with the following definition: dummy1:dummy && dummy2: dummy && dummy3: dummy. The dummy app is a docker container, that fails randomly with 50% chance.
  2. Execute the SCDF Task with the --increment-instance-enabled=true and wait for one of the dummy task to fail (restart if needed).
  3. To resume the same failed execution, execute the SCDF Task now --increment-instance-enabled=false. And let it finish successfully (Redo if needed).
  4. Start the SCDF Task again with --increment-instance-enabled=true.

At step 4 the composed task throws the JobInstanceAlreadyCompleteException, even though the --increment-instance-enabled is enabled again.

Caused by: org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException: A job instance already exists and is complete for parameters={-spring.cloud.data.flow.taskappname=composed-task-runner, -spring.cloud.task.executionid=3190, -spring.datasource.username=testuser, -graph=aaa-stackoverflow-dummy2 && aaa-stackoverflow-dummy3, -spring.cloud.data.flow.platformname=default, -spring.datasource.url=jdbc:postgresql://10.10.10.10:5432/tms_efa?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory&currentSchema=dev, -spring.datasource.driverClassName=org.postgresql.Driver, -spring.datasource.password=pass1234, -spring.cloud.task.name=aaa-stackoverflow, -dataflowServerUri=https://scdf-dev.company.com:443/ , -increment-instance-enabled=true}. If you want to run this job again, change the parameters.

Is there a better way to resume and restart the task?

KeyMaker00
  • 6,194
  • 2
  • 50
  • 49
Daniel Yu
  • 11
  • 3
  • Since Composed Tasks Runner is built using Spring Batch you can restart the Composed Task Run from the Jobs tab on the dashboard. Repeat Steps 1 & 2. But for step 3 go to the Jobs tab on the dashboard click the drop down button on the job you want to restart. And select `Restart the job`. – Glenn Renfro Sep 12 '19 at 14:45
  • 1
    Thanks for your answer @Glenn. I couldn't use that function as it probably has a bug in combination with Kubernetes/Openshift. When I use the button, it will start a pod with the following arguments: `[...] --graph=aaa-stackoverflow-v2-dummy2 && aaa-stackoverflow-v2-dummy3 [...] --graph=aaa-stackoverflow-v2-dummy2 && aaa-stackoverflow-v2-dummy3 --increment-instance-enabled=true [..:]` All of the other arguments are also passed twice. Also I suspect that it will restart the task instead of resuming it, as it passes the `--increment-instance-enabled=true` or do I understand it wrong? – Daniel Yu Sep 12 '19 at 16:00

0 Answers0