0

I'm trying to implement a mechanism where the runnables are both producer and consumer;

Situation is-

I need to read records from the DB in batches, and process the same. I'm trying this using producer consumer pattern. I get a batch, I process. Get a batch, process. This gets a batch whenever it sees queue is empty. One of the thread goes and fetches things. But the problem is that I can't mark the records that get fetched for processing, and that's my limitation. So, if we fetch the next batch before entirely committing the previous, I might fetch the same records again. Therefore, I need to be able to submit the previous one entirely before pulling the other one. I'm getting confused as to what should I do here. I've tried keeping the count of the fetched one, and then holding my get until that count is reached too.

What's the best way of handling this situation? Processing records from DB in chunks- the biggest limitation I've here is that I can't mark the records which have been picked up. So, I want batches to go through sequentially. But a batch should use multithreading internally.

public class DealStoreEnricher extends AsyncExecutionSupport {
private static final int BATCH_SIZE = 5000;
private static final Log log = LogFactory.getLog(DealStoreEnricher.class);
private final DealEnricher dealEnricher;
private int concurrency = 10;
private final BlockingQueue<QueryDealRecord> dealsToBeEnrichedQueue;
private final BlockingQueue<QueryDealRecord> dealsEnrichedQueue;
private DealStore dealStore;
private ExtractorProcess extractorProcess;
ExecutorService executor;

public DealStoreEnricher(DealEnricher dealEnricher, DealStore dealStore, ExtractorProcess extractorProcess) {
    this.dealEnricher = dealEnricher;
    this.dealStore = dealStore;
    this.extractorProcess = extractorProcess;
    dealsToBeEnrichedQueue = new LinkedBlockingQueue<QueryDealRecord>();
    dealsEnrichedQueue = new LinkedBlockingQueue<QueryDealRecord>(BATCH_SIZE * 3);
}

public ExtractorProcess getExtractorProcess() {
    return extractorProcess;
}

public DealEnricher getDealEnricher() {
    return dealEnricher;
}

public int getConcurrency() {
    return concurrency;
}

public void setConcurrency(int concurrency) {
    this.concurrency = concurrency;
}

public DealStore getDealStore() {
    return dealStore;
}

public DealStoreEnricher withConcurrency(int concurrency) {
    setConcurrency(concurrency);
    return this;
}

@Override
public void start() {
    super.start();
    executor = Executors.newFixedThreadPool(getConcurrency());
    for (int i = 0; i < getConcurrency(); i++)
        executor.submit(new Runnable() {
            public void run() {
                try {
                    QueryDealRecord record = null;
                    while ((record = get()) != null && !isCancelled()) {
                        try {
                            update(getDealEnricher().enrich(record));
                            processed.incrementAndGet();
                        } catch (Exception e) {
                            failures.incrementAndGet();
                            log.error("Failed to process deal: " + record.getTradeId(), e);
                        }
                    }
                } catch (InterruptedException e) {
                    setCancelled();
                }
            }
        });

    executor.shutdown();
}

protected void update(QueryDealRecord enrichedRecord) {
    dealsEnrichedQueue.add(enrichedRecord);
    if (batchComplete()) {
        List<QueryDealRecord> enrichedRecordsBatch = new ArrayList<QueryDealRecord>();
        synchronized (this) {
            dealsEnrichedQueue.drainTo(enrichedRecordsBatch);
        }
        if (!enrichedRecordsBatch.isEmpty())
            updateTheDatabase(enrichedRecordsBatch);
    }
}

private void updateTheDatabase(List<QueryDealRecord> enrichedRecordsBatch) {
    getDealStore().insertEnrichedData(enrichedRecordsBatch, getExtractorProcess());
}

/**
 * @return true if processed records have reached the batch size or there's
 *         nothing to be processed now.
 */
private boolean batchComplete() {
    return dealsEnrichedQueue.size() >= BATCH_SIZE || dealsToBeEnrichedQueue.isEmpty();
}

/**
 * Gets an item from the queue of things to be enriched
 * 
 * @return {@linkplain QueryDealRecord} to be enriched
 * @throws InterruptedException
 */
protected synchronized QueryDealRecord get() throws InterruptedException {
    try {
        if (!dealsToBeEnrichedQueue.isEmpty()) {
            return dealsToBeEnrichedQueue.take();
        } else {
            List<QueryDealRecord> records = getNextBatchToBeProcessed();
            if (!records.isEmpty()) {
                dealsToBeEnrichedQueue.addAll(records);
                return dealsToBeEnrichedQueue.take();
            }
        }
    } catch (InterruptedException ie) {
        throw new UnRecoverableException("Unable to retrieve QueryDealRecord", ie);
    }
    return null;
}

private List<QueryDealRecord> getNextBatchToBeProcessed() {


    List<QueryDealRecord> recordsThatNeedEnriching = getDealStore().getTheRecordsThatNeedEnriching(getExtractorProcess());

    return recordsThatNeedEnriching;
}

@Override
public void stop() {
    super.stop();
    if (executor != null)
        executor.shutdownNow();
}

@Override
public boolean await() throws InterruptedException {
    return executor.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS) && !isCancelled() && complete();
}

@Override
public boolean await(long timeout, TimeUnit unit) throws InterruptedException {
    return executor.awaitTermination(timeout, unit) && !isCancelled() && complete();
}

private boolean complete() {
    setCompleted();
    return true;
}

}

karansardana
  • 1,039
  • 2
  • 9
  • 8
  • Maybe you want a [`Phaser`](http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Phaser.html) to control the “current batch” for all threads. – Holger Dec 12 '14 at 14:21
  • If you want to have two tasks where one should wait for the other to complete before you do the other one, the simplest thing to do is have one thread. Multi-threads work best where they have independent tasks to perform i.e. one doesn't wait for the other. – Peter Lawrey Dec 12 '14 at 14:50

2 Answers2

2

You're already using a BlockingQueue - it does all that work for you.

However, you're using the wrong method addAll() to add new elements to the queue. That method will throw an exception if the queue is not able to accept elements. Rather you should use put() because that's the blocking method corresponding to take(), which you are using correctly.

Regarding your statement in the post title:

second batch shouldn't come until the previous batch is complete

You need not be concerned about the timing of the incoming versus outgoing batches if you use BlockingQueue correctly.

gknicker
  • 5,509
  • 2
  • 25
  • 41
1

It looks like a Semaphore will work perfectly for you. Have the producing thread acquire the semaphore while the consuming thread releases the semaphore when it completes the batch.

BlockingQueue blockingQueue = ...;
Semapore semaphore = new Semaphore(1);

Producing-Thread

Batch batch = db.getBatch();
semaphore.acquire(); // wait until previous batch completes
blockingQueue.add(batch);

Consuming Thread

for(;;){
    Batch batch = blockingQueue.take();
    doBatchUpdate(batch);
    semaphore.release(); // tell next batch to run
}
John Vint
  • 39,695
  • 7
  • 78
  • 108