SCDF Composed Task Runner gives us the option to turn on the --increment-instance-enabled. This option creates an artificial run.id parameter, which increments for every run. Therefore the task is unique for Spring Batch and will restart.
The problem with the IdIncrementer is when I mix it with execution without the IdIncrementer. In the event when a task does not finish, I want to resume the Task. The problem I encountered was when the task finishes without the IdIncrementer, I could not start the task again with the IdIncrementer.
I was wondering what would be the best way to restart with the option to resume?
My idea would be to create a new IdResumer, which uses the same run.id as the last execution.
We are run SCDF 2.2.1 on Openshift v3.11.98 and we use CTR 2.1.1.
The steps to reproduce this:
- Create a new SCDF Task Definition with the following definition:
dummy1:dummy && dummy2: dummy && dummy3: dummy
. The dummy app is a docker container, that fails randomly with 50% chance. - Execute the SCDF Task with the
--increment-instance-enabled=true
and wait for one of the dummy task to fail (restart if needed). - To resume the same failed execution, execute the SCDF Task now
--increment-instance-enabled=false
. And let it finish successfully (Redo if needed). - Start the SCDF Task again with
--increment-instance-enabled=true
.
At step 4 the composed task throws the JobInstanceAlreadyCompleteException
, even though the --increment-instance-enabled
is enabled again.
Caused by: org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException: A job instance already exists and is complete for parameters={-spring.cloud.data.flow.taskappname=composed-task-runner, -spring.cloud.task.executionid=3190, -spring.datasource.username=testuser, -graph=aaa-stackoverflow-dummy2 && aaa-stackoverflow-dummy3, -spring.cloud.data.flow.platformname=default, -spring.datasource.url=jdbc:postgresql://10.10.10.10:5432/tms_efa?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory¤tSchema=dev, -spring.datasource.driverClassName=org.postgresql.Driver, -spring.datasource.password=pass1234, -spring.cloud.task.name=aaa-stackoverflow, -dataflowServerUri=https://scdf-dev.company.com:443/ , -increment-instance-enabled=true}. If you want to run this job again, change the parameters.
Is there a better way to resume and restart the task?