I have a datastream that I am converting to a Table so that I may use Flink SQL syntax on the table. My Flink SQL query includes a user-defined Table Function, meaning each input row results in multiple output row, using a generator as in the documentation. Here is my UDTF:
class CalculateValueDiff(TableFunction):
def eval(message_time, ids, grouped_values, is_t1s):
t1_players = [id for id, is_t1 in zip(ids, is_t1s) if is_t1]
t2_players = [id for id, is_t1 in zip(ids, is_t1s) if not is_t1]
for t1_player in t1_players:
for t2_player in t2_players:
value_diff = abs(grouped_values[ids.index(t1_player)] - grouped_values[ids.index(t2_player)])
yield Row(value_diff, t1_player, t2_player)
Once I register this UDTF as calculate_value_diff
(along with custom collect_list_<T>
aggregators for different data types- those are working fine), I run this SQL query:
res_table = t_env.sql_query("""
SELECT
message_time,
calculate_value_diff(
collect_list_int(id),
collect_list_float(value),
collect_list_bool(is_t1)
) as value_diffs
FROM InputTable
GROUP BY message_time
""")
res_stream = t_env.to_changelog_stream(res_table)
The issue I'm having is the that output of the UDTF is technically a generator, not a row. So when I call t_env.to_changelog_stream(res_table)
, I get the error: AttributeError: 'generator' object has no attribute 'get_fields_by_names'
. It seems that stream converter expects a row (works fine if I replace the yield
with return
. But then I'm obviously only getting the first output value for the UDTF. Any advice?
Not sure how to proceed from here... What really needs to happy is the pyflink.fn_execution.coder_impl_fast.RowCoderImpl.encode_to_stream
function needs to be adjusted to iterate through the generator is the value is a generator, but is there a way around this?