1

I am facing an issue. I am testing with 3 consumers and 1 producer. From al keystrokes a producer is producing, consumers are unable to receive all data being sent by the producer. What could be the reason for this?

In the following screenshot, producer sent a , b , c and d but only d was received.enter image description here

The bottom-right is the producer and other 3 are the consumers listening to the same stream. As we see, only one consumer in the bottom left has received d and other data has been lost.

Code that I am testing with:

Producer:

var stdin = process.openStdin();

function insert( input ) {

    var params = {
    Data: input,
    PartitionKey: 'users',
    StreamName: 'test-stream1'
    };
    kinesis.putRecord( params, function ( err, data ) {
    if ( err ) console.log( err, err.stack ); // an error occurred
    else console.log( data );           // successful response
    } );
}



stdin.addListener( "data", function ( d ) {
    // PRODUCING THE KEY STROKES
    // TYPED BY USER INPUT
    insert( d.toString().trim() );
} );

Consumer:

    function getRecord() {
        kinesis.describeStream( {
        StreamName: 'test-stream1'
        }, function ( err, streamData ) {
        if ( err ) {
            console.log( err, err.stack ); // an error occurred
        } else {
            // console.log( streamData ); // successful response
            streamData.StreamDescription.Shards.forEach( shard => {
            kinesis.getShardIterator( {
                ShardId: shard.ShardId,
                ShardIteratorType: 'LATEST',
                StreamName: 'test-stream1'
            }, function ( err, shardIteratordata ) {
                if ( err ) {
                    // console.log( err, err.stack ); // an error occurred
                } else {
                    //console.log(shardIteratordata); // successful response
                    kinesis.getRecords( {
                        ShardIterator: shardIteratordata.ShardIterator
                    }, function ( err, recordsData ) {
                        if ( err ) {
                            // console.log( err, err.stack ); // an error occurred
                        } else {
                            // console.log( JSON.stringify( recordsData ) ); // successful response
                            recordsData.Records.forEach( record => {
                                console.log( record.Data.toString(), shard.ShardId );
                            } );
                        }
                    } );
                }
            } );
            } );
        }
        } );
    }

    setInterval( getRecord, 1000 * 1 );

I have used iterator type as LATEST so that each consumer gets the latest data from the producer.

Suhail Gupta
  • 22,386
  • 64
  • 200
  • 328

1 Answers1

1

If I am not mistaken you are always reading after the most recent records. This is configured via the ShardIteratorType: 'Latest'. According to the documentation it says

LATEST - Start reading just after the most recent record in the shard, so that you always read the most recent data in the shard.

This should only be used to get the very first iterator and afterwards you need to get the next iterator starting at the exact same position where you ended with the last one.

Therefore you can use the NextShardIterator of the GetIterator request if present to followup on the comping records. See doc.

Currently you are discarding the iterator after each interval and starting at the very end again.

Example

I took your code and moved the setInterval to only repeat the getRecords request with the next iterator

function getRecord() {
  kinesis.describeStream({ StreamName: 'test-stream1'}, function ( err, streamData ) {
    if ( err ) {
      console.log( err, err.stack ); // an error occurred
    } else {
      // console.log( streamData ); // successful response
      streamData.StreamDescription.Shards.forEach( shard => {
        kinesis.getShardIterator({
          ShardId: shard.ShardId,
          ShardIteratorType: 'LATEST',
          StreamName: 'test-stream1'
        }, function ( err, shardIteratordata ) {
          if ( err ) {
            console.log( err, err.stack ); // an error occurred
          } else {
            var shardIterator = shardIteratordata.ShardIterator;

            setInterval(function() {
              kinesis.getRecords({ ShardIterator: shardIterator }, function ( err, recordsData ) {
                if ( err ) {
                  console.log( err, err.stack ); // an error occurred
                } else {
                  // console.log( JSON.stringify( recordsData ) ); // successful response
                  recordsData.Records.forEach(record => {
                    console.log( record.Data.toString(), shard.ShardId );
                  });
                  shardIterator = iterator = recordsData.NextShardIterator;
                }
              });
            }, 1000 * 1 );

          }
        });
      });
    }
  });
}
Fionn
  • 136
  • 8
  • Could you give an example. – Suhail Gupta Jan 25 '18 at 16:28
  • Okay, that works. But I keep getting duplicates. For example, the records that have been read, is there a way to avoid them if I am having multiple consumers? – Suhail Gupta Jan 29 '18 at 06:58
  • To avoid having duplicates across all consumers I would suggest having one consumer per shard. In the scenario you described it would mean 3 shards and each of them got his own consumer. This will make sure you are not getting duplicates. – Fionn Jan 29 '18 at 16:26
  • 1
    Yeah, did that. One consumer and multiple children – Suhail Gupta Jan 29 '18 at 17:29
  • But won't this increase my cost? Also, is there a way to distribute data among different nodes? – Suhail Gupta Jan 30 '18 at 11:11
  • Also, is there a way I can read data from a particular shard?? – Suhail Gupta Jan 30 '18 at 11:17
  • 1
    The records in kinesis can be consumed multiple times and are not vanished after you read them once. If you want to rely on getting each record only once you need to iterate on the same `shardIterator`. This will not speed up the consumer part which I think you want to archive by having 3 consumers. To speed this up you can divide your stream into multiple shards and attach 1 consumer per shard (https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding.html). This will produce more cost! Therefore you can skip getting the shards but hardcode them in each producer. – Fionn Jan 31 '18 at 07:22
  • 2
    How do I scale? What if the consumer reading from a particular shard goes down? – Suhail Gupta Jan 31 '18 at 21:14
  • 1
    To scale up a kinesis stream you usually would increase the number of shards (https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-scaling.html). This is necessary as the limitation in kinesis is _1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second_ (https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html). If one consumer is going down the data is kept for up to 7 days (https://docs.aws.amazon.com/streams/latest/dev/kinesis-extended-retention.html) and the consumer can later on continue iterating on the shard. – Fionn Feb 02 '18 at 11:18