0

I have a Kinesis Analytics SQL application (legacy) that computes most frequent items (top 10) in 1 minute window using TOP_K_ITEMS_TUMBLING function:

CREATE OR REPLACE STREAM "TOP_N_STREAM" 
("myItem" VARCHAR(256), "frequency" BIGINT);

CREATE OR REPLACE PUMP "TOP_N_PUMP" AS 
INSERT INTO "TOP_N_STREAM"
SELECT STREAM * 
FROM TABLE (TOP_K_ITEMS_TUMBLING(
    CURSOR(SELECT STREAM * FROM "SOURCE_STREAM"), 
    'myItem', 
    10, -- top N
    60) -- 1 minute window
);

I have configured a lambda as destination for this stream, so I can do some processing with these top 10 items. The problem is that it seems that not all the data is delivered to the lambda. For example if I have a very frequent item respect to others, this item is never delivered to the lambda.

For example considering this data output from the TOP_N_STREAM

item1   217342
item2   1411
item3   1284
item4   1092
item5   975
item6   661
item7   645
item8   381
item9   335
item10  319

item1 will never be delivered to the lambda, or at least it never shows up in the lambda logs. Anyone has any clue why does this happen? Is it something related to number of shards/computing power/concurrency?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
revy
  • 3,945
  • 7
  • 40
  • 85

0 Answers0