You can do this in one of two ways:
- Remove duplicates from the resulting arrays using the
array_distinct
function:
WITH mytable(id, message) AS (VALUES
(1, 'foo'), (1, 'foo'), (1, 'bar'), (1, 'bar'), (1, 'baz'), (1, 'baz'),
(2, 'foo'), (2, 'bar'), (2, 'bar'), (2, 'bar'), (2, 'baz'),
(3, 'foo'), (3, 'foo'), (3, 'foo'), (3, 'bar'), (3, 'bar'), (3, 'baz')
)
SELECT array_distinct(array_agg(message)) AS sequence
FROM mytable
GROUP BY id
- Use the
DISTINCT
qualifier in the aggregation to remove the duplicate values before they are passed into array_agg.
WITH mytable(id, message) AS (VALUES
(1, 'foo'), (1, 'foo'), (1, 'bar'), (1, 'bar'), (1, 'baz'), (1, 'baz'),
(2, 'foo'), (2, 'bar'), (2, 'bar'), (2, 'bar'), (2, 'baz'), (3, 'foo'),
(3, 'foo'), (3, 'foo'), (3, 'bar'), (3, 'bar'), (3, 'baz')
)
SELECT array_agg(DISTINCT message) AS sequence
FROM mytable
GROUP BY id
Both alternatives produce the same result:
sequence
-----------------
[foo, bar, baz]
[foo, bar, baz]
[foo, bar, baz]
(3 rows)
UPDATE: You can remove repeated sequences of elements with the recently introduced MATCH_RECOGNIZE
feature:
WITH mytable(id, message) AS (VALUES
(1, 'foo'), (1, 'foo'), (1, 'bar'), (1, 'baz'), (1, 'bar'), (1, 'baz'),
(2, 'foo'), (2, 'bar'), (2, 'bar'), (2, 'bar'), (2, 'baz'),
(3, 'foo'), (3, 'foo'), (3, 'foo'), (3, 'bar'), (3, 'bar'), (3, 'baz')
)
SELECT array_agg(value) AS sequence
FROM mytable
MATCH_RECOGNIZE(
PARTITION BY id
MEASURES A.message AS value
PATTERN (A B*)
DEFINE B AS message = PREV(message)
)
GROUP BY id