I apologize if this question is repeated.
Our current environment:
Java 1.17
Spring 5.3.18
Spring batch 4.2.8
At a high level, our architecture intent is to physically separate the launcher threads from the execution threads for our spring batch processes, shotgunning heavy workload steps across the available processors on worker nodes. We have designed the partitioners and flows for this model of operation.
The expectation is that on the worker systems we can have a bunch of "step" beans floating loosely in the JVM, to be partitioned at the "master" JVM, propagated out via AMQ, then picked up and executed asynchronously at the worker VM's.
I have reviewed the documentation at https://docs.spring.io/spring-batch/docs/4.2.x/reference/html/spring-batch-integration.html#remote-partitioning . The example given (and indeed all of the examples I have found to date on the internet) are written as if there is "A" single step that is being run remotely.
Today:
We are using XML bean configuration for the jobs because of some peculiarities with Spring and Java scoping. Ironically, in our case the XML bean definitions offered scoping options that were not available in the Java DSL.
The XML below is an excerpt from a working configuration with a single remote step bean.
On the master side, we have this PartitionHandler configuration:
<bean id="ecPartitionHandler" class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler">
<property name="stepName" value="as-step0002.slave"/>
<property name="jobExplorer" ref="jobExplorer"/>
<property name="messagingOperations" ref="amqMessagingTemplate"/>
</bean>
<int:poller default="true" task-executor="stepTaskExecutor" fixed-delay="1000" />
On the slave side, we have this configuration:
<bean id="stepExecutionRequestHandler"
class="org.springframework.batch.integration.partition.StepExecutionRequestHandler">
<property name="jobExplorer" ref="jobExplorer" />
<property name="stepLocator" ref="stepLocator" />
</bean>
<bean id="stepLocatorAmq"
class="org.springframework.batch.integration.partition.BeanFactoryStepLocator" />
<bean id="slavePartitionHandler" class="org.springframework.batch.integration.partition.MessageChannelPartitionHandler">
<property name="stepName" value="as-step0002.slave"/>
<property name="gridSize" value="3"/>
<property name="messagingOperations" ref="stepMessagingTemplate"/>
</bean>
<bean id="amq-properties"
class="com.maxis.mxarchive.spring.InjectableProperties"
factory-method="getAmqProperties">
<constructor-arg ref="configPropertiesService" />
</bean>
Observation:
The originating master and the receiving slave message handlers both directly reference the specific step to be executed.
Question:
From a purely pragmatic perspective, does this mean that I can simply add more MessageChannelPartitionHandler bean pairs referencing the appropriate steps to ensure that spawned partitions are picked up and executed by the correct step beans on the worker systems?
Or do I need to plug in a flow with a decider to pick the appropriate step from the step ExecutionContext?
Or should I implement a StepLocator bean?
Thank you,
Welp, I'm more lost than ever. Since the examples I've found appear to be built to the simplest possible scenario, there is little to generalize to the model I'm trying to build.
Here is an approximation of the architecture as I understand it now:
There will be 0 to n jobs running concurrently, with 0 to n steps from however many jobs running on the slave VM's concurrently.
- Does each master job (or step?) require its own request and reply channel, and by extension its own OutboundChannelAdapter? Or are the request and reply channels shared?
- Does each master job (or step?) require its own aggregator? By implication this means each job (or step) will have its own partition handler (which may be supported by the existing codebase)
- The StepLocator on the slave appears to require a single shared replyChannel across all steps, but it appears to me that the messageChannelpartitionHandler requires a separate reply channel per step.
What I think is unclear (but I can't tell since it's unclear) is how the single reply channel is picked up by the aggregatedReplyChannel and then returned to the correct step. Of course I could be so lost I'm asking the wrong questions