1

Let's say I create two tables using the following SQL, such that post has many comment:

CREATE TABLE IF NOT EXISTS post (
            id          SERIAL PRIMARY KEY,
            title       VARCHAR NOT NULL,
            text        VARCHAR NOT NULL
        )

CREATE TABLE IF NOT EXISTS comment (
            id          SERIAL PRIMARY KEY,
            text        VARCHAR NOT NULL,
            post_id  SERIAL REFERENCES post (id)
        )

I would like to be able to query these tables so as to serve a response that looks like this:

{
  "post" : [
    { id: 100,
      title: "foo",
      text: "foo foo",
      comment: [1000,1001,1002] },
    { id: 101,
      title: "bar",
      text: "bar bar",
      comment: [1003] }
  ],
  "comment": [
    { id: 1000,
      text: "bla blah foo",
      post: 100 },
    { id: 1001,
      text: "bla foo foo",
      post: 100 },
    { id: 1002,
      text: "foo foo foo",
      post: 100 },
    { id: 1003,
      text: "bla blah bar",
      post: 101 },
  ]
}

Doing this naively would involve to SELECT statements, the first along the lines of

SELECT DISTINCT ON(post.id), post.title, post.text, comment.id
FROM post, comment
WHERE post.id = comment.post_id

... and the second something along the lines of

SELECT DISTINCT ON(comment.id), comment.text, post.id
FROM post, comment
WHERE post.id = comment.post_id

However, I cannot help but think that there is a way to do this involving only one SELECT statement - is this possible?


Notes:

  • I am using Postgres, but I do not require a Postgres-specific solution. Any standard SQL solution should do.
  • The queries above are illustrative only, they do not give we exactly what is necessary at the moment.
  • It looks like what the naive solution here does is perform the same join on the same two tables, just doing a distinct on a different table each time. This definitely leaves room for improvement.
  • It appears that ActiveModel Serializers in Rails already do this - if someone familair with them would like to chime in how they work under the hood, that would be great.
bguiz
  • 27,371
  • 47
  • 154
  • 243
  • What do you mean "side loaded query"? Maybe "site loading query"? – Hogan Aug 08 '14 at 14:04
  • @Hogan I do mean side-loading, it is when a (RESTful) API includes not just the resource requested, but also the resources it is related to, in a single response. – bguiz Aug 08 '14 at 14:35
  • @Hogan Here is a resource I found of someone doing something similar: http://programmingarehard.com/2013/11/10/eloquent_and_views.html ... and a description of side loading: http://emberjs.com/guides/models/the-rest-adapter/#toc_sideloaded-relationships – bguiz Aug 08 '14 at 14:46
  • "Side loading" here is a description of your API and your API design. "Side loading" is not a modifier on the query -- it has nothing to do with the SQL. SQL queries are not "side loaded" and never will be. – Hogan Aug 08 '14 at 14:50
  • @Hogan ah yes, perhaps I should have been more clear, by query I meant API query, not SQL query. – bguiz Aug 08 '14 at 14:53

2 Answers2

1

You need two queries to get the form you laid out:

SELECT p.id, p.title, p.text, array_agg(c.id) AS comments
FROM   post p
JOIN   comment c ON c.post_id = p.id
WHERE  p.id = ???
GROUP  BY p.id;

Or faster, if you really want to retrieve all or most of your posts:

SELECT p.id, p.title, p.text, c.comments
FROM   post p
JOIN  (
   SELECT post_id, array_agg(c.id) AS comments
   FROM   comment
   GROUP  BY 1
   ) c ON c.post_id = p.id
GROUP  BY 1;

Plus:

SELECT id, text, post_id
FROM   comment
WHERE  post_id = ??;

Single query

SQL can only send one result type per query. For a single query, you would have to combine both tables, listing columns for post redundantly. That conflicts with the desired response in your question. You have to give up one of the two conflicting requirements.

SELECT p.id, p.title, p.text AS p_text, c.id, c.text AS c_text
FROM   post p
JOIN   comment c ON c.post_id = p.id
WHERE  p.id = ???

Aside: The column comment.post_id should be integer, not serial! Also, column names are probably just for a quick show case. You wouldn't use the non-descriptive text as column name, which also conflicts with a basic data type. Compare this related case:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Yes there is something fishy here -- his queries don't map to his results (yours do). – Hogan Aug 08 '14 at 14:03
  • @ erwin - What is the value of `??`. If that is a single post's ID, then I'm afraid that this will not work, because I need to be able to specify multiple post IDs, and select them as well as all of the comments that appear in them. See the sample response object in my question, it contains two posts. – bguiz Aug 08 '14 at 14:41
  • Thanks for your tip on serial vs pointers, that's handy to know! Regarding agenda that is not showing in my question, not really, I just want to know if there a more performant way of accomplishing this task than a naive solution which involves two select statements that join the same two tables, and simply do a distinct on different columns. – bguiz Aug 08 '14 at 14:49
  • @bguiz: The value of `??` is for you to decide, or better yet, define in your question to begin with. It could also be `p.id = ANY ('{1,2,3}')` for multiple IDs. – Erwin Brandstetter Aug 08 '14 at 14:59
1

However, I cannot help but think that there is a way to do this involving only one SELECT statement - is this possible?

Technically: yes. If you really want your data in json anyway, you could use PostgreSQL (9.2+) to generate it with the json functions, like:

SELECT row_to_json(sq)
FROM (
  SELECT array_to_json(ARRAY(
           SELECT row_to_json(p)
           FROM (
             SELECT *, ARRAY(SELECT id FROM comment WHERE post_id = post.id) AS comment
             FROM post
           ) AS p
         )) AS post,
         array_to_json(ARRAY(
           SELECT row_to_json(comment)
           FROM comment
         )) AS comment
) sq;

But I'm not sure it's worth it -- usually not a good idea to dump all your data without limit / pagination.

SQLFiddle

pozs
  • 34,608
  • 5
  • 57
  • 63
  • There are 5 selects within that compound statement, how much of a performance hit would this entail, compared to doing two separate queries, such as the method suggested by @erwin? – bguiz Aug 09 '14 at 00:25
  • @bguiz that is one of the reasons I wrote *I'm not sure it's worth it* – pozs Aug 11 '14 at 08:45