1

From the Postgres documentation (https://www.postgresql.org/docs/9.6/sql-createaggregate.html) I find it hard to deduce what the parameter SORTOP does. Is this option only applicable to an ordered-set aggregate?

Concretely I'm trying to create an aggregate function that finds the most frequent number in a column of numbers. I thought specifying the SORTOP option would sort the data before executing my self defined aggregate function, but this doesn't seem to be the case.

Here is my current implementation that only works when the input data is sorted. It loops over the rows and keeps track of the largest sequence of previous numbers (largfreq variables in state) and the amount of repetitions seen so far of the number that it's currently on (currfreq variables in state).

CREATE TYPE largfreq_state AS (
    largfreq_val INT,
    largfreq INT,
    currfreq_val INT,
    currfreq INT
);

CREATE FUNCTION slargfreq(state largfreq_state, x INT) RETURNS largfreq_state AS $$
BEGIN
    if state.currfreq_val <> x then
        if state.currfreq >= state.largfreq then
            state.largfreq = state.currfreq;
            state.largfreq_val = state.currfreq_val;
        end if; 
        state.currfreq = 1;
        state.currfreq_val = x;
    else
        state.currfreq = state.currfreq + 1;
    end if;
    return state;
END;
$$ language plpgsql;

CREATE FUNCTION flargfreq(state largfreq_state) RETURNS INT AS $$
BEGIN
    if state.currfreq >= state.largfreq then
        return state.currfreq_val;
    else
        return state.largfreq_val;
    end if;
END;
$$ language plpgsql;

CREATE AGGREGATE largfreq(INT) (
    SFUNC = slargfreq,
    STYPE = largfreq_state,
    FINALFUNC = flargfreq,
    INITCOND = '(0, 0, 0, 0)',
    SORTOP = <
);

1 Answers1

1

This is well explained in the documentation:

Aggregates that behave like MIN or MAX can sometimes be optimized by looking into an index instead of scanning every input row. If this aggregate can be so optimized, indicate it by specifying a sort operator. The basic requirement is that the aggregate must yield the first element in the sort ordering induced by the operator; in other words:

SELECT agg(col) FROM tab;

must be equivalent to:

SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;

So you need that for aggregates that can be calculated using an index scan.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263