I am using pykafka for consuming message and now I am using balanced_consumer for consuming message from one topic. Now I have to consume messages from another topic, and if it is possible to priority consuming message from different topics. How can I handle with this problem? May be other library for python?
1 Answers
I just posted a post about this issue.
Even Though I am using Java, you can find the concept described there useful for your case.
What we did tackle the issue of prioritizing Kafka topics is -
We developed a mechanism to prioritize the consumption of Kafka topics. Such a mechanism will check if we want to process a message that was consumed from Kafka, or hold the processing for later.
We maped between the partitions and Booleans, which blocks the consuming of each partition if necessary, topicPartitionLocks. Blocking the preliminary ones, while continuing to consume from the tardy ones, creates prioritization of topics. A TimerTask updates this map and our consumers check if they are “allowed” to consume or have to wait – as you can see in the method waitForLatePartitionIfNeeded.
public class Prioritizer extends TimerTask {
private Map<String, Boolean> topicPartitionLocks = new ConcurrentHashMap<>();
private Map<String, Long> topicPartitionLatestTimestamps = new ConcurrentHashMap<>();
@Override
public void run(){
updateTopicPartitionLocks();
}
private void updateTopicPartitionLocks() {
Optional<Long> minValue = topicPartitionLatestTimestamps.values().stream().min((o1, o2) -> (int) (o1 - o2));
if(! minValue.isPresent()) {
return;
}
Iterator it = topicPartitionLatestTimestamps.entrySet().iterator();
while (it.hasNext()) {
Boolean shouldLock = false;
Map.Entry<String, Long> pair = (Map.Entry)it.next();
String topicPartition = pair.getKey();
if(pair.getValue() > (minValue.get() + maxGap)) {
shouldLock = true;
if(isSameTopicAsMinPartition(minValue.get(), topicPartition)) {
shouldLock = false;
}
}
topicPartitionLocks.put(topicPartition, shouldLock);
}
}
public boolean isLocked(String topicPartition) {
return topicPartitionLocks.get(topicPartition).booleanValue();
}
}
waitForLatePartitionIfNeeded method
private void waitForLatePartitionIfNeeded(final String topic, int partition) {
String topicPartition = topic + partition;
prioritizer.getTopicPartitionLocks.putIfAbsent(topicPartition);
while(prioritizer.isLocked(topicPartition)) {
monitorWaitForLatePartitionTimes(topicPartition, startTime);
Misc.sleep(timeToWaitBetweenGapToTardyPartitionChecks.get());
}
}
Using this we caused increased rebalance, so we solved it with this definitions:
We changed the next configuration in Kafka
request.timeout.ms: 7300000 (~2hrs)
max.poll.interval.ms: 7200000 (2hrs)
For graphs and general descriptions about the issue you can check my post:
How I Resolved Delays in Kafka Messages by Prioritizing Kafka Topics
Good Luck!

- 980
- 6
- 17
-
Looks like site is down. Should you use AtomicBoolean for this rather than a plain one? – OneCricketeer Oct 24 '18 at 01:56
-
@cricket_007 - Since I am using ConcurrentHashMap the booleans are thread safe – Gal S Oct 24 '18 at 05:38