0

My Use case- Collect events for a particular duration and then group them based on the key

Objective After processing, user can save data of particular duration based on the key

How i am planning to do 1)Receive events from Kafka

2)Create data stream of events

3)associate a table with it and collect data for a particular duration by running a SQL query

4)associate a new table with step-2 output and group collected data according to the key

5)save the data in DB

Solution i tried-

I am able to-

1)receive events from Kafka,

2)setup a data stream(lets say sensorDataStream)-

DataStream<SensorEvent> sensorDataStream 
         = source.flatMap(new FlatMapFunction<String, SensorEvent>() {
            @Override
            public void flatMap(String catalog, Collector<SensorEvent> out) {
            // create SensorEvent(id, sensor notification value, notification time) creation
             });

3)associate a table(lets say table1) with data stream and after running SQL query like-

SELECT id, sensorNotif, notifTime FROM SENSORTABLE WHERE notifTime > t1_Timestamp AND notifTime < t2_Timestamp.

Here t1_Timestamp and t2_Timestamp is predefined epoch time and will change based on some predefined conditions

4)I am able to print this sql query result by using following query on the console-

tableEnv.toAppendStream(table1, Row.class).print();

5)Created a new table(lets say table2) by using table1 and following type of sql query-

Table table2 = tableEnv.sqlQuery("SELECT id AS SensorID, COUNT(sensorNotif) AS SensorNotificationCount FROM table1 GROUP BY id);

6)Collecting and print data by using -

tableEnv.toRetractStream(table2 , Row.class).print();

Problem

1)I am not able to see output of step 6 on the console.

I did some experiment and found that If i skip table1 setup step(that means no sensor data clubbing for a duration) and directly associate my senserDataStream with table2 then i can see the output of step-6 but as this is RetractStream so i can see data in the form of and if new event is coming then this retract stream will invalidate data and print newly calculated data.

Suggestion i would like to have

1)How can i merge step 5 and step 6(means table1 and table2). I already merged these tables but as data is not visible on console so i have doubt? Am i doing something wrong? Or data is merged but not visible?

2)My plan is to --

2.a)filter data in 2 pass, in first pass filter data for a particular interval and in second pass group this data

2.b)Save 2.a output in DB Will this approach work(i have doubt because i am using data stream and table1 out put is append stream but table2 output is retract stream)?

flinkuser
  • 1
  • 1
  • Can you explain what data "clubbing" is? I'm not familiar with this term, and google isn't helping. – David Anderson Jul 17 '19 at 19:28
  • Hi David, I modified description. What i mean by - "club the data" is "collect the data" based on some criteria – flinkuser Jul 17 '19 at 19:59
  • Not sure what's wrong, but I would try rewriting this as a single query (with a sub-query). – David Anderson Jul 17 '19 at 20:12
  • I also think that my queries should work but as i am not able to check output of table2 so i have doubt. I want to check my table2 data for better understanding. So do you think adding sync(DB) will help me to visualize data? Or if you can suggest some other way so that i can check/analyze my data. – flinkuser Jul 17 '19 at 21:42
  • I have found that using Flink's SQL client speeds up my explorations considerably. And I've found that trying a query a bunch of different ways helps me understand what's going on. That's why I suggested the sub-query approach. You could also transform table1 into a view; that might bring some insights. – David Anderson Jul 18 '19 at 08:13

0 Answers0