0

We want to execute a set of jobs, but some are conflicting, meaning cannot be executed in parallel. Is there anything (interfaces / libraries) supporting workflow in Java ecosystem?

Like we have passengers and we'd like to process orders but two orders for the same passenger cannot be processed in parallel, but sequentially.

The pool of awaiting jobs on the event of a signal that the thread worker is ready to execute a task could check what jobs are still in work and pick any non-conflicting.

The standard Java thread pool executor has a queue embedded but that queue knows nothing about currently running jobs - it only uses some static order without knowledge of conflicting jobs.

gavenkoa
  • 45,285
  • 19
  • 251
  • 303
  • 1
    Try Apache Camel, it will solve your issue. You can also use it in Non-Blocking way. https://camel.apache.org/ – Shekhar Jul 19 '23 at 08:59
  • 1
    @Shekhar Which feature of camel would allow parallel execution whilst also limiting to one parallel process per passenger? – lance-java Jul 20 '23 at 14:01
  • So Camel is a big framework for Workflows. You can customize the flows completely based on your requirement. It provides you with all the interfaces that you can implement yourself and put conditions on Flows. – Shekhar Jul 20 '23 at 14:33
  • 1
    I'm very familiar with camel... I use it every day. I want to know the exact mechanism that will allow for parallel execution whilst also preventing messages for the same passenger from executing in parallel – lance-java Jul 20 '23 at 15:13

2 Answers2

1

In this situation you would likely use sharding/partitioning.

public interface HashFunction<T> {
   /**
    * Calculate a hash that is
    *   - repeatable (same value always gets the same hash)
    *   - evenly distributed (so jobs are distributed fairly across the threads)
    */
   int accept(T value);
}
/**
 * Maintains threadCount single threaded executors
 * Ensures that tasks for the same ID execute serially, on a single thread
 * whilst jobs for different ID's can execute in parallel
 */
public class PartitionedExecutor<ID> {
   private final int threadCount;
   private final HashFunction<ID> hashFunction;
   private final ExecutorService[] executors;
   
   public PartitionedExecutor(int threadCount, HashFunction<ID> hashFunction) {
      this.threadCount = threadCount;
      this.hashFunction = hashFunction;
      this.executors = IntStream.range(0, threadCount)
         .mapToObj(i -> Executors.newSingleThreadedExecutor())
         .toArray(Executors[]::new);
   }   
   
   public Future<V> submit(ID identifier, Callable<V> task) {
      int threadIndex = hashFunction.accept(identifier) % threadCount;
      return executors[threadIndex].submit(task);
   }
}
public class PassengerService {
   private final PartitionedExecutor<Long> executor;

   public PassengerService(int threadCount) {
      // assuming passengerId is a Long here 
      // Using Long.hashCode() as the hash function
      this.executor = new PartitionedExecutor<>(threadCount, Long::hashCode);
   }
   
   public Future<Result> processOrder(PassengerOrder order) {
      return executor.submit(order.getPassengerId(), () -> doProcessOrder(order));
   }
   
   public Future<Result> processAmend(PassengerAmend amend) {
      return executor.submit(amend.getPassengerId(), () -> doProcessAmend(amend));
   }

   public Future<Result> processDelete(Long passengerId) {
      return executor.submit(passengerId, () -> doProcessDelete(passengerId));
   }
   
   private Result doProcessOrder(PassengerOrder order) {
      // TODO: implement
   }
   
   private Result doProcessAmend(PassengerAmend amend) {
      // TODO: implement
   }

   private Result doProcessDelete(Long passengerId) {
      // TODO: implement
   }
}

Apache Kafka has this concept at it's heart (you can't send a message without first assigning it a partition)

lance-java
  • 25,497
  • 4
  • 59
  • 101
  • This solution is based on existing Java building blocks, `Executors.newSingleThreadedExecutor()` and `hashCode() % partitionCount` reduces the parallelism unfortunately. I see the problem with `ThreadPoolExecutor` because if has constructor with: `BlockingQueue` but this queue is not informed about existing jobs or finished jobs... – gavenkoa Jul 19 '23 at 09:52
  • Seems partitioning is a simple, easy to understand solution. But partitioning is not about about 100% thread usage - it is more like a workaround to avoid conflicting jobs without locking or other methods. Synchronization is done via already implemented queues. – gavenkoa Jul 19 '23 at 10:07
  • 1
    For most workloads partitioning will evenly distribute your load across the threads. You are correct that some workloads may result in "hot" threads and "idle" threads. The other option would be a custom executor that buffers tasks for passengers that are currently executing on the pool. This complexity and synchronization would likely not yield better performance for most workloads. – lance-java Jul 20 '23 at 11:17
0

Not sure about the libraries directly available for specific requirement but the answer would work out well with arguably maximum concurrency.

Sagar
  • 104
  • 1
  • 5
  • `ConcurrentMap` helps with building "lock" but this solution leaves rescheduling maintenance outside - where is the major burden is... Locking is simple, but messaging among the threads and the pool of jobs in not... – gavenkoa Jul 25 '23 at 15:27
  • Could you add more information on desired behaviour in case of conflict.For e.g: jobList=job1,job2job1,job1 if t1 thread is executing on job1 should it search for entire jobList first(if yes the dynamic nature of list i.e jobs addition is questionable) or it can go ahead and run the task? If t1 and t3 both try to run task then only one should succeed and the other one should ignore the request and mark the task as not completed and add in jobList again for trying again. – Sagar Jul 26 '23 at 07:48
  • Right, tasks operated over the resource that have some ID. 2 jobs cannot run simultaneously if they refer to the same ID. So when the task is retrieved from the pool of pending the first thing we need to insure no one operated that ID, but then we also need a proper rescheduling for postponed blocking tasks - returning to the pool end is not enough: if there are no any non-conflicting pending tasks - we don't want 100% CPU load to picking / returning the same objects.. So thread safe check for conflicts & intelligent rescheduling are necessary. – gavenkoa Jul 26 '23 at 10:35