0

I'm facing a problem in which I don't get results from my query in Flink-SQL.

I have some informations stored in two Kafka Topics, I want to store them in two tables and perform a join between them in a streaming way.

These are my flink instructions :

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
        // configure Kafka consumer
        Properties props = new Properties();
        props.setProperty("bootstrap.servers", "localhost:9092"); // Broker default host:port
        props.setProperty("group.id", "flink-consumer"); // Consumer group ID

        FlinkKafkaConsumer011<Blocks> flinkBlocksConsumer = new FlinkKafkaConsumer011<>(args[0], new BlocksSchema(), props);
        flinkBlocksConsumer.setStartFromEarliest();

        FlinkKafkaConsumer011<Transactions> flinkTransactionsConsumer = new FlinkKafkaConsumer011<>(args[1], new TransactionsSchema(), props);
        flinkTransactionsConsumer.setStartFromEarliest();

        DataStream<Blocks> blocks = env.addSource(flinkBlocksConsumer);

        DataStream<Transactions> transactions = env.addSource(flinkTransactionsConsumer);

        tableEnv.registerDataStream("blocksTable", blocks);
        tableEnv.registerDataStream("transactionsTable", transactions);

Here is my SQL query :

Table sqlResult
   = tableEnv.sqlQuery(
       "SELECT block_timestamp,count(tx_hash) " +
       "FROM blocksTable " +
       "JOIN transactionsTable " +
       "ON blocksTable.block_hash=transactionsTable.tx_hash " +
       "GROUP BY blocksTable.block_timestamp");
DataStream<Test> resultStream = tableEnv
        .toRetractStream(sqlResult,Row.class)
        .map(t -> {
             Row r = t.f1;
             String field2 = r.getField(0).toString();
             long count = Long.valueOf(r.getField(1).toString());
             return new Test(field2,count);
             })
        .returns(Test.class);

Then, I print the results :

resultStream.print();

But I don't get any answers, my program is stuck...

For the schema used for serialization and deserialization, here is my test class which stores the result of my query (two fields a string and a long for respectively the block_timestamp and the count) :

public class TestSchema implements DeserializationSchema<Test>, SerializationSchema<Test> {


    @Override
    public Test deserialize(byte[] message) throws IOException {
        return Test.fromString(new String(message));
    }

    @Override
    public boolean isEndOfStream(Test nextElement) {
        return false;
    }

    @Override
    public byte[] serialize(Test element) {
        return element.toString().getBytes();
    }

    @Override
    public TypeInformation<Test> getProducedType() {
        return TypeInformation.of(Test.class);
    }
}

This is the same principle for BlockSchema and TransactionsSchema classes.

Do you know why I can't get the result of my query ? Should I test with BatchExecutionEnvironment ?

Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
Gatsby
  • 365
  • 1
  • 5
  • 17
  • Do you get results for a simpler query, like `SELECT * FROM blocksTable`? – Fabian Hueske Aug 28 '18 at 14:05
  • Yes, sorry I forgot to specify that. SELECT * from blocksTable works fine. – Gatsby Aug 28 '18 at 14:16
  • Hmm, the query looks good to me. I guess you are sure that the topic contain records that should join, right? `BatchTableEnvironment` won't work, because it does not support Kafka as data source. – Fabian Hueske Aug 28 '18 at 15:41
  • Yes the topic contains records that should join, I already tried to do this join in Hive and it worked. Maybe it's just a matter of time ? I have 13gb of records. With Hive I've tried with 1gb of records. – Gatsby Aug 29 '18 at 07:36
  • Could you post a link to a branch such that might use `env.fromElements()` to reproduce your problem? Then someone could investigate the problem with a debugger. – twalthr Aug 29 '18 at 12:31
  • This is the example I implemented using env.fromElements() : https://gist.github.com/christopheblp/89614c0c384db0fba1d46a66427e9348 It's working great, I think the amount of data is too important in addition to the two separated topics in Kafka – Gatsby Aug 31 '18 at 08:28
  • I decided to denormalize the data by puting it in only one table in one Kafka Topic and it's now working since there aren't any join. – Gatsby Aug 31 '18 at 08:30

0 Answers0