0

I'm trying to generate row_number using Apache Beam SQL with below code:

PCollection<Row> rwrtg =
        PCollectionTuple.of(new TupleTag<>("trrtg"), rrtg)
                        .apply(SqlTransform.query("select appId, row_number() over (partition by appId order by rating asc) as issue_rank from trrtg"));

But getting below error:

java.lang.RuntimeException: cannot translate call ROW_NUMBER() OVER (PARTITION BY $t0 ORDER BY $t1 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

Could you please advise how to fix this?

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
Syed Mohammed Mehdi
  • 183
  • 2
  • 5
  • 15

1 Answers1

2

Beam SQL supports two Dialects:-

  1. Beam Calcite - Details can be found here regarding supported operators.
  2. ZetaSQL - Details can be found here regarding supported operators

Both these dialects today don't support row_number() analytics function. Hence you are getting the error.

Jayadeep Jayaraman
  • 2,747
  • 3
  • 15
  • 26
  • If I need row_number() how can we generate in Beam SQL, using udaf can we do? any suggestion please? – Syed Mohammed Mehdi Apr 20 '20 at 16:35
  • Note that BeamSQL has started to support OVER and window clauses: https://issues.apache.org/jira/browse/BEAM-9198. With this syntax support BeamSQL could implement row_number() in near term. – Rui Wang Jul 24 '20 at 17:12
  • Here is a table that lists functions are planed to be implemented: https://docs.google.com/document/d/1nUbV45iL_avgAewYYTkyHHJWY8ZaVcFuky-dQ-pcE0M/edit#heading=h.6whiyp3fzu7o – Rui Wang Jul 24 '20 at 17:13