STRING_AGG ignores GROUP BY in PostgreSQL

Question

I have prepared an SQL Fiddle for my question -

In a 2-player word game I store players and their games in the 2 tables:

CREATE TABLE players (
    uid SERIAL PRIMARY KEY,
    name text NOT NULL
);

CREATE TABLE games (
    gid SERIAL PRIMARY KEY,
    player1 integer NOT NULL REFERENCES players ON DELETE CASCADE,
    player2 integer NOT NULL REFERENCES players ON DELETE CASCADE
);

And the letter tiles placing moves and resulting words and scores are stored in another 2 tables:

CREATE TABLE moves (
    mid BIGSERIAL PRIMARY KEY,
    uid integer NOT NULL REFERENCES players ON DELETE CASCADE,
    gid integer NOT NULL REFERENCES games ON DELETE CASCADE,
    played timestamptz NOT NULL,
    tiles jsonb NOT NULL
);

CREATE TABLE scores (
    mid     bigint  NOT NULL REFERENCES moves ON DELETE CASCADE,
    uid     integer NOT NULL REFERENCES players ON DELETE CASCADE,
    gid     integer NOT NULL REFERENCES games ON DELETE CASCADE,
    word    text    NOT NULL CHECK(word ~ '^[A-Z]{2,}$'),
    score   integer NOT NULL CHECK(score >= 0)
);

Here I fill the above tables with test data containing a game and 2 players (Alice and Bob):

INSERT INTO players (name) VALUES ('Alice'), ('Bob');
INSERT INTO games (player1, player2) VALUES (1, 2);

Their interchanging moves are below, sometimes a single move can produce 2 words:

INSERT INTO moves (uid, gid, played, tiles) VALUES
(1, 1, now() + interval '1 min', '[{"col": 7, "row": 12, "value": 3, "letter": "A"}, {"col": 8, "row": 12, "value": 10, "letter": "B"}, {"col": 9, "row": 12, "value": 1, "letter": "C"}, {"col": 10, "row": 12, "value": 2, "letter": "D"}]
'::jsonb), 
(2, 1, now() + interval '2 min', '[{"col": 7, "row": 12, "value": 3, "letter": "X"}, {"col": 8, "row": 12, "value": 10, "letter": "Y"}, {"col": 9, "row": 12, "value": 1, "letter": "Z"}]
'::jsonb), 
(1, 1, now() + interval '3 min', '[{"col": 7, "row": 12, "value": 3, "letter": "K"}, {"col": 8, "row": 12, "value": 10, "letter": "L"}, {"col": 9, "row": 12, "value": 1, "letter": "M"}, {"col": 10, "row": 12, "value": 2, "letter": "N"}]
'::jsonb), 
(2, 1, now() + interval '4 min', '[]'::jsonb), 
(1, 1, now() + interval '5 min', '[{"col": 7, "row": 12, "value": 3, "letter": "A"}, {"col": 8, "row": 12, "value": 10, "letter": "B"}, {"col": 9, "row": 12, "value": 1, "letter": "C"}, {"col": 10, "row": 12, "value": 2, "letter": "D"}]
'::jsonb), 
(2, 1, now() + interval '6 min', '[{"col": 7, "row": 12, "value": 3, "letter": "P"}, {"col": 8, "row": 12, "value": 10, "letter": "Q"}]
'::jsonb);

INSERT INTO scores (mid, uid, gid, word, score) VALUES
(1, 1, 1, 'ABCD', 40),
(2, 2, 1, 'XYZ', 30),
(2, 2, 1, 'XAB', 30),
(3, 1, 1, 'KLMN', 40),
(3, 1, 1, 'KYZ', 30),
(5, 1, 1, 'ABCD', 40),
(6, 2, 1, 'PQ', 20),
(6, 2, 1, 'PABCD', 50);

As you can see above, the tiles column is always a JSON list of objects.

But I only need to retrieve the single property of the objects: letter.

So here is my SQL-code (to be used in PHP script displaying player moves in a certain game):

SELECT 
    STRING_AGG(x->>'letter', ''),
    STRING_AGG(y, ', ')
FROM (
    SELECT 
        JSONB_ARRAY_ELEMENTS(m.tiles) AS x,
        FORMAT('%s (%s)', s.word, s.score) AS y
    FROM moves m
    LEFT JOIN scores s
    USING (mid)
    WHERE m.gid = 1
    GROUP BY mid, s.word, s.score
    ORDER BY played ASC
) AS z;

Unfortunately, it does not work as expected.

The both STRING_AGG calls put everything together in two huge strings, despite me trying to GROUP BY mid:

Is there please a way to split the resulting strings by a mid (aka move id)?

UPDATE:

My problem isn't the sorting. My problem is that I get 2 huge strings, while I would expect multiple strings, one pair per move id (aka mid).

Here is my expected output, does anybody please have a suggestion on how to achieve it?

mid   "concatenated 'letter' from JSON"   "concatenated words and scores"
 1                  'ABCD'                       'ABCD (40)'
 2                  'XYZ'                        'XYZ (30), XAB (30)'               
 3                  'KLMN'                       'KLMN (40), KYZ (30)'
 5                  'ABCD'                       'ABCD (40)'
 6                  'PQ'                         'PQ (20), PABCD (50)'

UPDATE #2:

I have followed the suggestion by Laurenz (thank you! Here the SQL Fiddle) with:

SELECT 
    mid,
    STRING_AGG(x->>'letter', '') AS tiles,
    STRING_AGG(y, ', ') AS words
FROM (
    SELECT 
        mid,
        JSONB_ARRAY_ELEMENTS(m.tiles) AS x,
        FORMAT('%s (%s)', s.word, s.score) AS y
    FROM moves m
    LEFT JOIN scores s
    USING (mid)
    WHERE m.gid = 1
) AS z
GROUP BY mid
ORDER BY mid;

But for some reason the "word (score)" entries are multiplied:

Could you provide an example of the output you want? – 404 Mar 16 '18 at 10:33 — 404, Mar 16 '18 at 10:33

Laurenz Albe · Accepted Answer · 2018-03-16T13:53:53.103

2

If you want to group by mid, you have to add that column to the SELECT list of the inner query and add GROUP BY mid to the outer query.

You can use DISTINCT inside the aggregate to remove duplicates:

SELECT 
    mid,
    STRING_AGG(DISTINCT x->>'letter', '') AS tiles,
    STRING_AGG(DISTINCT y, ', ') AS words
FROM (
    SELECT 
        mid,
        JSONB_ARRAY_ELEMENTS(m.tiles) AS x,
        FORMAT('%s (%s)', s.word, s.score) AS y
    FROM moves m
    LEFT JOIN scores s
    USING (mid)
    WHERE m.gid = 1
) AS z
GROUP BY mid;

ORDER BY mid;

edited Mar 16 '18 at 13:53

answered Mar 16 '18 at 10:34

Laurenz Albe

209,280
17
206
263

Laurenz, thank you for your suggestion. I [have tried it out](http://sqlfiddle.com/#!17/8f294/8) but for some reason the "word (score)" strings are multiplied - please see the updated question – Alexander Farber Mar 16 '18 at 13:33
Unfortunately, I get the [ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list](http://sqlfiddle.com/#!17/8f294/20) – Alexander Farber Mar 16 '18 at 13:48
I don't wont to sound nitpicking, but my test was not so god in the sense that all letters where different. If I change my test to AA, XX, KK then [the letters are omitted](http://sqlfiddle.com/#!17/4ef8b/2). But I guess, SQL can not solve every requirement or is there a way? Should I perform a different kind of JOIN? – Alexander Farber Mar 16 '18 at 14:04

Gordon Linoff · Answer 2 · 2018-03-16T12:22:42.257

If you want the results in a particular ordering, then using the order by clause in the aggregation call, as described in the documentation:

SELECT STRING_AGG(x->>'letter', '' ORDER BY played),
       STRING_AGG(y, ', ' ORDER BY played)
FROM (SELECT JSONB_ARRAY_ELEMENTS(m.tiles) AS x,
             FORMAT('%s (%s)', s.word, s.score) AS y
      FROM moves m LEFT JOIN
           scores s
           USING (mid)
      WHERE m.gid = 1
      GROUP BY mid, s.word, s.score
     ) z;

As for using a subquery, note the documentation:

This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7. Alternatively, supplying the input values from a sorted subquery will usually work.

I guess you have found a case where "usually" doesn't apply. The safer method is the one using explicit syntax.

EDIT:

Your outer query is an aggregation query that returns one row. So everything is brought together.

If you want one row per mid, you need a GROUP BY in the outer query:

SELECT STRING_AGG(x->>'letter', '' ORDER BY played),
       STRING_AGG(y, ', ' ORDER BY played)
FROM (SELECT JSONB_ARRAY_ELEMENTS(m.tiles) AS x,
             FORMAT('%s (%s)', s.word, s.score) AS y
      FROM moves m LEFT JOIN
           scores s
           USING (mid)
      WHERE m.gid = 1
      GROUP BY mid, s.word, s.score
     ) z
GROUP BY mid;

Gordon, thank you - I [have tried your suggestion](http://sqlfiddle.com/#!17/8f294/15) but the "words (scores)" column is duplicated/triplicated for some reason. — Alexander Farber, Mar 16 '18 at 13:45

score 0 · Answer 3 · answered Mar 20 '18 at 09:24

I have been able to get rid of DISTINCT by using CTE (here SQL Fiddle):

WITH cte1 AS (
SELECT 
    mid,
    STRING_AGG(x->>'letter', '') AS tiles
FROM (
        SELECT 
            mid,
            JSONB_ARRAY_ELEMENTS(tiles) AS x
        FROM moves
        WHERE gid = 1
) AS z
GROUP BY mid),
cte2 AS (
        SELECT 
        mid,
        STRING_AGG(y, ', ') AS words
    FROM (
        SELECT 
            mid,
            FORMAT('%s (%s)', word, score) AS y
        FROM scores
        WHERE gid = 1
) AS z
GROUP BY mid)
SELECT 
    mid, 
    tiles, 
    words 
FROM cte1 
JOIN cte2 using (mid) 
ORDER BY mid ASC;

STRING_AGG ignores GROUP BY in PostgreSQL

3 Answers3