3

Is there any equivalent function in Bigquery using Standard SQL for max_by and regr_slope functions. If not how to achieve this.

Thanks, Maniyar

Maniyar
  • 39
  • 1
  • 2

2 Answers2

1

It does not appear that BigQuery supports either of these functions out of the box. For MAX_BY, you may just use ROW_NUMBER:

-- replacement for MAX_BY(col, val)
-- find the value for col which generates the MAX value

WITH cte AS (
    SELECT col, ROW_NUMBER() OVER (ORDER BY val DESC) rn
    FROM yourTable
)

SELECT col
FROM cte
WHERE rn = 1;

Or, you could use LIMIT if your version of BigQuery supports that:

SELECT col
FROM yourTable
ORDER BY val DESC
LIMIT 1;

As for doing linear regressions in BigQuery, I refer you to this highly upvoted Stack Overflow question and answer.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • The first query in the above example (ROW_NUMBER) requires a PARTITION BY clause. My further requirement is I need to join this query output to another result set. The size of the table is 100GB, will there be any problem in terms of performance and the resource limit – Maniyar Jan 10 '19 at 13:25
  • Well since BigQuery does not support indexing AFAIK, then yes, there may be a peformance penalty for such a large table. – Tim Biegeleisen Jan 10 '19 at 13:49
  • Thanks @Tim Biegeleisen. Is there any other alternate in my scenario – Maniyar Jan 10 '19 at 15:21
  • Note that I can readily think of. If BigQuery supports triggers, you could try to implement a trigger which compares a new incoming value against the current max, and if larger, then writes that new max value to a second table. – Tim Biegeleisen Jan 10 '19 at 15:23
  • Linear regression: BigQuery now can `CREATE MODEL`, and then predict with `SELECT FROM ML.PREDICT`. There's a better alternative to `ROW_NUMBER() OVER`, which is `ARRAY_AGG(x ORDER BY y LIMIT 1)` – Felipe Hoffa Jan 10 '19 at 22:31
1

As @felipe-hoffa commented in another question, the shortest way to replace MAX_BY seems to use ARRAY_AGG like this:

SELECT
  ARRAY_AGG(x ORDER BY y DESC LIMIT 1)[OFFSET(0)] AS max_x_by_y
FROM yadda_yadda

Ronie
  • 530
  • 5
  • 9