Configuration of MessageChannelPartitionHandler for assortment of remote steps

Question

I apologize if this question is repeated.

Our current environment:

Java 1.17
Spring 5.3.18
Spring batch 4.2.8

At a high level, our architecture intent is to physically separate the launcher threads from the execution threads for our spring batch processes, shotgunning heavy workload steps across the available processors on worker nodes. We have designed the partitioners and flows for this model of operation.

The expectation is that on the worker systems we can have a bunch of "step" beans floating loosely in the JVM, to be partitioned at the "master" JVM, propagated out via AMQ, then picked up and executed asynchronously at the worker VM's.

I have reviewed the documentation at https://docs.spring.io/spring-batch/docs/4.2.x/reference/html/spring-batch-integration.html#remote-partitioning . The example given (and indeed all of the examples I have found to date on the internet) are written as if there is "A" single step that is being run remotely.

Today:

We are using XML bean configuration for the jobs because of some peculiarities with Spring and Java scoping. Ironically, in our case the XML bean definitions offered scoping options that were not available in the Java DSL.

The XML below is an excerpt from a working configuration with a single remote step bean.

On the master side, we have this PartitionHandler configuration:

<bean id="ecPartitionHandler" class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler">
    <property name="stepName" value="as-step0002.slave"/>
    <property name="jobExplorer" ref="jobExplorer"/>
    <property name="messagingOperations" ref="amqMessagingTemplate"/>
</bean>

<int:poller default="true" task-executor="stepTaskExecutor" fixed-delay="1000" />

On the slave side, we have this configuration:

<bean id="stepExecutionRequestHandler"
    class="org.springframework.batch.integration.partition.StepExecutionRequestHandler">
    <property name="jobExplorer" ref="jobExplorer" />
    <property name="stepLocator" ref="stepLocator" />
</bean>
<bean id="stepLocatorAmq"
    class="org.springframework.batch.integration.partition.BeanFactoryStepLocator" />

<bean id="slavePartitionHandler" class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler"> 
    <property name="stepName" value="as-step0002.slave"/> 
    <property name="gridSize" value="3"/>
    <property name="messagingOperations" ref="stepMessagingTemplate"/> 
</bean>

<bean id="amq-properties"
    class="com.maxis.mxarchive.spring.InjectableProperties"
    factory-method="getAmqProperties">
    <constructor-arg ref="configPropertiesService" />
</bean>

Observation:

The originating master and the receiving slave message handlers both directly reference the specific step to be executed.

Question:

From a purely pragmatic perspective, does this mean that I can simply add more MessageChannelPartitionHandler bean pairs referencing the appropriate steps to ensure that spawned partitions are picked up and executed by the correct step beans on the worker systems?

Or do I need to plug in a flow with a decider to pick the appropriate step from the step ExecutionContext?

Or should I implement a StepLocator bean?

Thank you,

Welp, I'm more lost than ever. Since the examples I've found appear to be built to the simplest possible scenario, there is little to generalize to the model I'm trying to build.

Here is an approximation of the architecture as I understand it now:

There will be 0 to n jobs running concurrently, with 0 to n steps from however many jobs running on the slave VM's concurrently.

Does each master job (or step?) require its own request and reply channel, and by extension its own OutboundChannelAdapter? Or are the request and reply channels shared?
Does each master job (or step?) require its own aggregator? By implication this means each job (or step) will have its own partition handler (which may be supported by the existing codebase)
The StepLocator on the slave appears to require a single shared replyChannel across all steps, but it appears to me that the messageChannelpartitionHandler requires a separate reply channel per step.

What I think is unclear (but I can't tell since it's unclear) is how the single reply channel is picked up by the aggregatedReplyChannel and then returned to the correct step. Of course I could be so lost I'm asking the wrong questions

The fact that you have a `slavePartitionHandler` on the worker side is a bit confusing to me and I'm not sure I understand what you are looking for. Are you trying to run multiple steps in a single worker or trying to run a partitioned step on the worker as well (meaning each worker creates a second level of partitioning, ie partitioning the partition that was assigned to it)? — Mahmoud Ben Hassine, Aug 04 '22 at 07:17
High level expectation: I want to leverage the resources of a cluster to run all of the heavy lifting steps in all of my running jobs. My particular processes are amenable to massively parallel processing. I have potentially hundreds of jobs running at any given time, each of which has different processing (steps), and are divisible into potentially thousands of partitions of a few thousand rows of data each. I want the heavy lifting to be shotgunned out to the cluster for processing. The slavePartitionHandler was implemented a few years back as a result of another SO thread. — pojo-guy, Aug 04 '22 at 14:57
... (continued) Yes, it is possible for a remote flow to further split or partition — pojo-guy, Aug 04 '22 at 15:44
... On the other hand if it is redundant, extraneous, or has a negative impact on the system, i am all in favor of removing the slavePartitionHandler you observe. — pojo-guy, Aug 04 '22 at 16:41
Thank you for the updates. I will add an answer with some details, hoping to help as much as possible. — Mahmoud Ben Hassine, Aug 05 '22 at 09:03

score 1 · Accepted Answer · answered Aug 05 '22 at 09:37

The type of step executed on the worker side is completely arbitrary, so nothing prevents you from running a partitioned step on the worker side as well. This, indeed, allows you to implement a second level partitioning, in which each worker can further partition the partition that was assigned to it.

Now to answer your questions:

From a purely pragmatic perspective, does this mean that I can simply add more MessageChannelPartitionHandler bean pairs referencing the appropriate steps to ensure that spawned partitions are picked up and executed by the correct step beans on the worker systems?

Yes, as mentioned previously, the step on the worker side can be a partitioned step. So it would requires its own Partitioner/PartitionHandler team of two.

Or do I need to plug in a flow with a decider to pick the appropriate step from the step ExecutionContext?

Nothing wrong with this approach , but I would personally not recommend it. The reason is that this approach will make the implementation complex, as almost all components should be step-scoped to get access the EC in order to set/get partitioning meta-data (step to execute, partition definition, etc) needed to do the work. Give it a try, and you will quickly realize what I'm trying to explain.

Or should I implement a StepLocator bean?

That could be an option if the step bean is defined outside the application context bootstrapped on the worker side. Otherwise, the default BeanFactoryStepLocator can be used to locate the (partitioned) step defined in the worker context.

That said, I just wanted to share some personal experience when working towards the same goal: how to maximise resources utilization when running multiple jobs on different machines. Here is a non exhaustive list of what worked well:

Design job instances to be independent from each others (for example a job instance per input file, instead of a single job that processes all input files). This maximises parallelism and enables fault-tolerance
Combine remote partitioning/chunking with a multi-threaded step on workers. This is a bit similar to what you are trying to do. The only difference is using a multi-threaded step on workers instead of partitioned step as you are trying to do (Which I never tried BTW).

I tried to summarize my experience on the matter in this post: Spring Batch on Kubernetes: Efficient batch processing at scale.

I'll make sure to consume that linked article. As I slowly remember what you and your compatriots have tried to tell me in prior chains, the receiving MessageChannelPartitionHandler only responds to requests for the step that is identified. So if I have 100 available steps to be queued up by the master partitioner, I need 100 pairs of MessageChannelPartitionHandler attached to the AMQ system. — pojo-guy, Aug 05 '22 at 14:54
Okay, I'm more confused than ever. I've started trying to formally analyze what we have that works and translate that into the next steps. Updates posted to question. — pojo-guy, Aug 22 '22 at 19:20
https://stackoverflow.com/questions/73450768/how-to-configure-channels-and-amq-for-spring-batch-integration-where-all-steps-a — pojo-guy, Aug 22 '22 at 21:39

Configuration of MessageChannelPartitionHandler for assortment of remote steps

Thank you,

1 Answers1

Linked