0

I have a datastream that I am converting to a Table so that I may use Flink SQL syntax on the table. My Flink SQL query includes a user-defined Table Function, meaning each input row results in multiple output row, using a generator as in the documentation. Here is my UDTF:

class CalculateValueDiff(TableFunction):
    def eval(message_time, ids, grouped_values, is_t1s):
        t1_players = [id for id, is_t1 in zip(ids, is_t1s) if is_t1]
        t2_players = [id for id, is_t1 in zip(ids, is_t1s) if not is_t1]

        for t1_player in t1_players:
            for t2_player in t2_players:
                value_diff = abs(grouped_values[ids.index(t1_player)] - grouped_values[ids.index(t2_player)])
                yield Row(value_diff, t1_player, t2_player)

Once I register this UDTF as calculate_value_diff (along with custom collect_list_<T> aggregators for different data types- those are working fine), I run this SQL query:

        res_table = t_env.sql_query("""
                                    SELECT 
                                        message_time,
                                        calculate_value_diff(
                                            collect_list_int(id),
                                            collect_list_float(value),
                                            collect_list_bool(is_t1)
                                        ) as value_diffs
                                    FROM InputTable
                                    GROUP BY message_time
                                    """)

        res_stream = t_env.to_changelog_stream(res_table)

The issue I'm having is the that output of the UDTF is technically a generator, not a row. So when I call t_env.to_changelog_stream(res_table), I get the error: AttributeError: 'generator' object has no attribute 'get_fields_by_names'. It seems that stream converter expects a row (works fine if I replace the yield with return. But then I'm obviously only getting the first output value for the UDTF. Any advice?

Not sure how to proceed from here... What really needs to happy is the pyflink.fn_execution.coder_impl_fast.RowCoderImpl.encode_to_stream function needs to be adjusted to iterate through the generator is the value is a generator, but is there a way around this?

c_mac
  • 1
  • 1

0 Answers0