0

How can I pass an entire Row to my ScalarFunction RowToTupleConverter in the following code? All the examples only address passing single or multiple values by name, but I want the whole result of the SELECT statement to be passed as a Row. My guess was using *, but that's not recognized as a valid parameter.

envT.registerFunction("toTuple", new RowToTupleConverter());
envT.createTemporaryView("t", envT.fromDataStream(ds));                     
Table result = envT.from("t").select("getAvroFieldString(f1, 'HASH_KEY') as hk,
               getAvroFieldLong(f1, 'LOAD_DATE') as ld, 'test' as NAME");
envT.toAppendStream(result.select("*").map("toTuple(*)"), new TupleTypeInfo[...]).print();

I do not want to address the individual fields but a whole row, since I'm building up everything generic, thus my ScalarFunction requires a parameter of type Row. The function iterates through the row and creates a Tuple2<GenericRecord,GenericRecord>> from the values of the row.

Background:

The job is built up like this, because we need both key and value from a Kafka Source using the Confluent Schema Registry, and the job should be generic to allow for an arbitrary schema, allowing multiple instantiations without changing the codebase. The only way we found to achieve this, is creating a DataStream from a FlinkKafkaConsumer, where Tuple2 includes the key and the value of a message each in an instance of GenericRecord, and transforming this to a Flink table. Since GenericRecord is a blackbox to the Table API, I followed recommendations in another thread and created simple ScalarFunctions, which extract the specific values I need. Right now that part is still hardcoded, but once everything works, it will also be generic. However, I'm struggling to wrap the result table back to a Tuple2, in order to write the transformed records back to another Kafka Topic, which is why I introduced another ScalarFunction to map from a Row to a Tuple2<GenericRecord,GenericRecord>>.

Is this possible and if so, how? If not, what kind of workaround could I use to solve this problem? I'd also appreciate suggestions for a more elegant way in general, but judging from the amount of research I did into that direction and due to the nature of the use case, I doubt there is. Unfortunately, moving to SpecificRecord is not an option.

kopaka
  • 535
  • 4
  • 17
  • Did you find any solutions? I have the same issue. – Grant Jan 20 '21 at 00:24
  • @Grant I have to disappoint you, I did not manage to get this running, which ultimately lead me to giving up that prototype. In the meantime, there were a few new Flink releases, maybe it's possible now, but you probably have to dig deep into the Documentation of the codebase. – kopaka Jan 20 '21 at 10:49
  • i figured it out by user-defined aggregate function, but still tricky – Grant Jan 29 '21 at 05:49
  • @Grant Nice, could you maybe provide a snippet of your solution? – kopaka Feb 01 '21 at 10:22
  • pls refer to this question: https://stackoverflow.com/questions/65896143/select-all-fields-as-json-string-as-new-field-in-flink-sql/65949284?noredirect=1#comment116626227_65949284 – Grant Feb 03 '21 at 03:27

0 Answers0