1

I am using IBM liberty server and running jsr 352 batch job. Launching a batch job using postman as rest call. But when I tried to stop the job using the instanceid, its saying status as "STOPPING" and it's takes own time to stop. sometimes it's in status of "STOPPING" for few hours. How to force the job to stop.

Note: The job has partition step which reads from database and creates output file.

Using the instanceid of the job, I'm trying to stop the job using postamn put method like below

https://*****:9443/ibm/api/batch/jobinstances/405573?action=stop

//put method from postman
https://*****:9443/ibm/api/batch/jobinstances/405573?action=stop

Response return:

"jobName": "test-job",
    "executionId": 405574,
    "instanceId": 405573,
    "batchStatus": "STOPPING",
    "exitStatus": "",

I would expect batch job should be stopped, when I tried to get the batch status using the below URL at least after few minutes or hour. But it takes few hours in some cases.

https://******:9443/ibm/api/batch/jobinstances/405573

user3540722
  • 175
  • 1
  • 2
  • 11
  • basically, I need to force and stop the running job and the status should be updated as STOPPED in few minutes or within hour. It takes few hours or more sometimes. – user3540722 Aug 20 '19 at 13:32
  • What type of work is the job running when the stop command is issued? You mentioned a partitioned step reading from a DB; is this what is executing? Is this a chunk step or a batchlet type of step? What version of Liberty is this? – Scott Kurz Aug 20 '19 at 16:18
  • Its Chunk partitioned step. For ex. 1000 partition loaded and have thread count of 5. Each thread will read from database and creates a file. I have a time boundary on my python script to stop the java batch when it reaches 10pm. I am calling the rest call with action to stop. But its saying instance status is STOPPING, its on the STOPPING state for a quite a long time.(few hours). – user3540722 Aug 22 '19 at 18:57
  • Below is the liberty version:Product name: WebSphere Application Server Product version: 18.0.0.4 Product edition: BASE_ILAN – user3540722 Aug 22 '19 at 18:57
  • See what you think of my answer.. I started as a comment with more questions but it got to be too big for a comment. – Scott Kurz Aug 27 '19 at 12:44

1 Answers1

1

Basics on "stop", for Batchlet vs Chunk steps

For a chunk step, the container checks to see if a stop has been issued after each item is read and processed. The assumption is the chunk step will read, process, write multiple chunks of multiple items, so a stop check after each item is enough to stop relatively soon. With a batchlet step, on the other hand, where the application processing isn't broken up into anything known to the container, the batch container instead will invoke the user-implemented stop() on a separate thread, which the application can use to interrupt processing on the main process() thread.

Ideas on Chunk step taking a long time to stop

For a chunk step taking a long time to respond to a stop(), one explanation then could simply be that the application is similarly taking a long time to read and process a single item.

If it were important enough, this could perhaps be addressed by refactoring item read and processing into more fine-grained logic, so each item is processed more quickly.

Another approach, if the logic can't easily be broken down into smaller "items", would be to refactor this into a batchlet, where you can implement the stop() method yourself, and react appropriately within your application. After all, there's a good chance if you couldn't break down the chunk into smaller items then you weren't getting much value out of checkpoints anyway.

Scott Kurz
  • 4,985
  • 1
  • 18
  • 40