0

When I try to use MATCH_RECOGNIZE in my SQL queries with Python UDFs, I get the error Python Function can not be used in MATCH_RECOGNIZE for now.

For example, the following is not supported:

SELECT T.aa as ta
FROM MyTable
MATCH_RECOGNIZE (
  ORDER BY proctime
  MEASURES
    A.a as aa,
    pyFunc(1,2) as bb
  PATTERN (A B)
  DEFINE
    A AS a = 1,
    B AS b = 'b'
) AS T

This raises a few questions:

  1. Why would it take for the Blink planner to support Python functions?

  2. Where could I find in the documentation this type of lack of support? The docs regarding this feature don't mention Python. Is it expected that I parse through validation tests?

  3. (main question) Is the best alternative to MATCH_RECOGNIZE a user-defined table aggregation Python function? I want to find just two events in sequence (within an hour window). I know I can do this with a self-join but I'd like to see if there's a more efficient/clean possibility.

pledab
  • 120
  • 1
  • 13

1 Answers1

1

As a workaround for not being able to use Python UDFs in the measures clause, it seems like you could produce as output from the MATCH_RECOGNIZE the data needed as input to the UDF, and then apply the UDF in a subsequent step.

Something like this:

SELECT
  T.aa AS ta, 
  pyFunc(T.one, T.two) AS tb
FROM MyTable
MATCH_RECOGNIZE (
  ORDER BY proctime
  MEASURES
    A.a AS aa,
    1 AS one,
    2 AS two
  PATTERN (A B)
  DEFINE
    A AS a = 1,
    B AS b = 'b'
) AS T

Using a self-join with an interval constraint on the time attributes should produce an efficient plan, should you decide to use that approach instead.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • I think I'd be inclined to go towards the self-join since it's slightly less clear for me how to filter for complex sequences without python UDFs in the DEFINE section. But glad that that's also advisable, thank you! – pledab Jul 27 '21 at 18:43
  • Apologies that this is only tangentially related, but I know LATERAL TABLE is used for table functions in SQL, is there a SQL equivalent for table aggregation functions? It seems the docs only show flatAggregate. – pledab Jul 27 '21 at 18:44
  • Update for those curious, table aggregation functions are not supported in Flink SQL. Use normal aggregation functions instead. – pledab Aug 03 '21 at 22:36