0

I have a kinesis analytics application set up which takes data from a kinesis stream which has the following schema.

--------------------------
Column          ColumnType
--------------------------
Level           varchar(10)
RootID          varchar(32)
ProcessID       varchar(16)
EntityName      varchar(64)
Message         varchar(512)
Threshold       varchar(32)
TriggerTime     timestamp

My objective is to create a realtime kinesis analytics solution which segregates the records with level "OVERFLOW" and groups them based on the RootID. All records belonging to a RootID are ideally expected to reach kinesis within a span of 5 minutes. So I am thinking of setting up a stagger window for this and so far I have come up with this SQL.

CREATE OR REPLACE STREAM "OVERFLOW_SQL_STREAM" (
    "Level" varchar (10),
    "RootID" varchar (32),
    "ProcessID" varchar(16),
    "EntityName" varchar(64),
    "Message" varchar(512),
    "Threshold" varchar(32),
    "TriggerTime" timestamp
);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS
    INSERT INTO "OVERFLOW_SQL_STREAM"
        SELECT STREAM 
            "Level" varchar (10),
            "RootID" varchar (32),
            "ProcessID" varchar(16),
            "EntityName" varchar(64),
            "Message" varchar(512),
            "Threshold" varchar(32),
            "TriggerTime" timestamp
        FROM "SOURCE_SQL_STREAM_001"
        WHERE "Level" like "OVERFLOW"
        WINDOWED BY STAGGER (
            PARTITION BY "RootID",FLOOR("TriggerTime" TO MINUTE) RANGE INTERVAL '5' MINUTE);

I received an error in the SQL stating that "PARTITION BY clause doesn't have the column 'Level'". I don't understand why should I add that column to partition as I want my records to be partitioned by only by the RootID column and not by any other. Adding that column throws error saying that I should add the next column and so on. I couldn't understand the error. Kindly help me!Thanks!

RobustPath004
  • 57
  • 1
  • 8

1 Answers1

0

There is a workaround for this type of problem. You can Use FIRST_VALUE() or LAST_VALUE() to cast the result instead of directly passing them.

CREATE OR REPLACE PUMP "STREAM_PUMP" AS
    INSERT INTO "OVERFLOW_SQL_STREAM"
    SELECT STREAM 
        LAST_VALUE("Level") AS Level,
       "RootID" varchar (32),
       ....
       ....
       ....
       "TriggerTime" timestamp
    FROM "SOURCE_SQL_STREAM_001"
    WHERE "Level" like "OVERFLOW"
    WINDOWED BY STAGGER (
        PARTITION BY "RootID",FLOOR("TriggerTime" TO MINUTE) RANGE INTERVAL '5' MINUTE);

This is the way you can create the stream pump without adding into the PARTITION BY clause.

FIRST_VALUE() -- To get the very first value of level matched by the stream partition (Here RootID)
LAST_VALUE() -- Viceversa