0

I have a kinesis consumer application developed using spring-integration-aws version 1.1.0.RELEASE.

In my tests, I am running two instance of this application in the same consumer group and consuming from a stream with two shards. In my tests I realized KinesisMessageDrivenChannelAdapter would distribute messages in three ways:

  1. All messages delivered to one consumer
  2. messages distributed to the both consumers (not evenly)
  3. Both consumers received same messages

From producer side, messages are distributed evenly between two shards. I would like to know how kinesis adapter distributes messages among consumers and if supported how I can get an even distribution among consumers.

Thank you

UPDATE (Adapter Configuration)

@Bean
  public KinesisMessageDrivenChannelAdapter kinesisInboundChannelAdapter(
      AmazonKinesis amazonKinesis) {
    String[] streamNames = this.consumerClientProperties.getKinesis().getStreamNames();
    KinesisMessageDrivenChannelAdapter adapter =
        new KinesisMessageDrivenChannelAdapter(amazonKinesis, streamNames);
    adapter.setConverter(null);
    adapter.setOutputChannel(new QueueChannel());
    adapter.setCheckpointStore(dynamoDbMetaDataStore());
    adapter.setCheckpointMode(CheckpointMode.record);
    adapter.setStartTimeout(10000);
    adapter.setConsumerGroup(consumerClientProperties.getName());
    adapter.setListenerMode(ListenerMode.record);
    adapter.setDescribeStreamRetries(1);
    return adapter;
  }

  @Bean
  public DynamoDbMetadataStore dynamoDbMetaDataStore() {
    DynamoDbMetadataStore dynamoDbMetaDataStore = new DynamoDbMetadataStore(amazonDynamoDB(),
        consumerClientProperties.getName());
    return dynamoDbMetaDataStore;
  }
sansari
  • 558
  • 1
  • 7
  • 17

1 Answers1

0

It is recommended to everyone to upgrade to the latest Spring Integration AWS 2.0: https://spring.io/blog/2018/08/21/spring-integration-for-aws-2-0-ga-and-spring-cloud-stream-kinesis-binder-1-0-ga

There were a plenty of fixes done on the Kinesis consumer level and now we have there a leader election do not subscribe to the same shard more than once.

The idea is to have strict ordering when we process records, therefore only one thread per cluster should have access to one shard. That thread may process several shards though.

Anyway if you use two instance of the application you need to inject a MetadataStore with shared data based, e.g. DynamoDbMetadataStore.

Artem Bilan
  • 113,505
  • 11
  • 91
  • 118
  • What a great news! I will update my project with this version to see the results. Thank you! – sansari Aug 31 '18 at 14:11
  • So I updated Spring Integration AWS to new version as well as Kinesis binder. I am running two instance with same consumer group name set on KinesisAdapter. Now both instances receive all messages from both shards. Is there any new configuration I should make? I update the question with my configuration. – sansari Aug 31 '18 at 16:01
  • Right, you are missing to provide also a `LockRegistry` for the `KinesisMessageDrivenChannelAdapter`: https://github.com/spring-projects/spring-integration-aws#lock-registry-for-amazon-dynamodb – Artem Bilan Aug 31 '18 at 16:16
  • Well, I created `DynamoDBLockRegistry` and injected it to `LockRegistryLeaderInitiator` and started the leader initiator. Leader selection is working fine during failovers, but all messages are going to just to only leader instance. – sansari Aug 31 '18 at 20:16
  • Well, I told you to inject a `DynamoDBLockRegistry` into the `KinesisMessageDrivenChannelAdapter`. I didn't talk about a `LockRegistryLeaderInitiator`, but this solution should work as well. And your results are correct. To be honest there is no guarantee in the current algorithm that shards are going to be distributed evenly. – Artem Bilan Aug 31 '18 at 20:20
  • Right, I added the LockRegistry to KinesisAdapter as well. I just want to know if there is a random distribution between instances or in my scenario I should always expect messages delivered to leader instance? – sansari Aug 31 '18 at 20:26
  • Well, there is a leader selection for each shard and we really might end up that the same app handle all the shards if the second one is slow enough to pick up shard in time – Artem Bilan Aug 31 '18 at 20:59
  • Right! In this case we would not achieve scalability at all especially in failovers. All shards will be picked up only by one app. Is there any plan to solve this issue in next releases? if not, where can I look into the code to start contributing to fix this issue. Without this we cannot scale our kinesis consumption. Thank you! – sansari Sep 04 '18 at 13:19
  • You can assign particular shards to different consumers. No, there is no such an argument that all shards are going to be assigned to the single instance. More over when this instance is gone, others can pick up abandoned shards. The code is here: https://github.com/spring-projects/spring-integration-aws. The contribution is welcome! You concern is valid and it really crossed my mind not one time. Only the problem that I don't have time think about this thoroughly. – Artem Bilan Sep 04 '18 at 13:26
  • That's right, but according to your previous comment fastest instance would take all shards and we are talking about milliseconds here. If I could assign shards manually in the code as a workaround it would be great until I can implement a shard handling among instances. – sansari Sep 04 '18 at 13:31
  • 1
    Yeah... That's true. We need to implement something like *shards interest* by a single master and manage shards distribution from there. Something like we have with Apache Kafka rebalance. – Artem Bilan Sep 04 '18 at 14:48