0

I was just wondering how to deal with eager loading queries to retrieve data with several associations, let me explain my models:

type User struct {
  Id uint
  Name string
  Username string
  Image string
}

type Post struct {
  Id unit
  Content string
  Author *User // I use user_id in author column for the posts table
  Comments []Comments
}

type Comments struct {
  Id unit
  PostId uint
  Message string
  CommentBy *User // I use user_id in comment_by column for the comments table
}

This is an example of retrieved data:

{
    "Posts": [
        {
            "Id": 1,
            "Content": "test post",
            "Author": {
                  "Id": 1,
                  "Name": "author",
                  "Username": "h3ll0",
                  "Image": "author.png"
            },
            "Comments": [
                {
                    "Id": 1,
                    "PostId": 1,
                    "Message": "good article",
                    "CommentBy": {
                        "Id": 2,
                        "Name": "second user",
                        "Username": "r3ader",
                        "Image": "reader.png"
                    }
                },
                {
                    "Id": 2,
                    "PostId": 1,
                    "Message": "bad article",
                    "CommentBy": {
                        "Id": 3,
                        "Name": "third user",
                        "Username": "thirD",
                        "Image": "third.png"
                    }
                }
            ]
        }
    ]
}

I want to retrieve the post data with all nested associations and map them to the Post struct. How would you do it with database/sql and pq (Postgres driver)?

I don't want to use any ORM at all, I only use database/SQL and SQLx.

Keep in mind that the performance is important because The posts and comments tables are big enough.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
Hossein
  • 115
  • 1
  • 11
  • 1
    I'd say you make 3 calls: get User, get all Posts for User and get all Comments for all Post ids you got in previous call. Then merge on client. Make a single call that joins the 3 tables in database is probably slower (especially if the tables are big), but I'd write a benchmark to make sure it actually is slower. Last option is to use another database that is better suited for this type of data. One thing a relational database is bad at is `relations`. A graph database or a document-based database could be better suited. As I don't know your entire use case I can't make that decision. – TehSphinX Sep 04 '20 at 08:28
  • thanks for commenting, I work on Information oriented application. I considered using NoSQL DB like Cassandra or even graph databases like [dgraph](https://dgraph.io/), but currently, I use Postgres. I simplified the models in the example, the real data models have a complex structure, imagine that a post can have multiple attachments, contributors, comments and an attachment itself has multiple links, and I want to populate aggregated data at once, It would be great if you help me and suggest the proper database solution for it, we are at an early stage, the DB migration would not be hard. – Hossein Sep 04 '20 at 08:37
  • draph is a great way to work with data that has a lot of relations by ID. I looks like it could work for your use case. We used it in a project recently but had to abandon it as we had a relation by time frames that was not really suited for dgraph (or we did not find a way to properly accomplish this). – TehSphinX Sep 04 '20 at 08:41
  • great, and what's about the dgraph go client? Is it mature enough to use it at production? – Hossein Sep 04 '20 at 08:43
  • We also have cassandra in production. The use case of cassandra is reliable response times in the 10-20ms timeframe with huge amounts of data. Cassandra does not have joins and you cannot query the data except by primary key (which can be multiple columns). There can be secondary keys but can significantly cost performance. – TehSphinX Sep 04 '20 at 08:45
  • @TehSphinX, which one you prefer for the mentioned data? Cassandra or dgraph? – Hossein Sep 04 '20 at 08:47
  • If any dgraph client is mature, it is the Go one as DGraph itself is written in Go. But we found that sometimes DGraph seems to behave strangely -- I think because it is very early in its development. But I can't say for sure as we didn't dig into every strange behaviour. There are big companies using it... – TehSphinX Sep 04 '20 at 08:48
  • DGraph fits better I think although I'd still be careful to get to know it first. Cassandra doesn't fit the model better than PostGREs. Unless you will have terrabytes of data in those tables, then cassandra is the best option I know so far. That is what I chose it for: no matter the amount of data I put in there, the write and read performance do not drop. And if they do one day, I can just add more nodes to the cluster. – TehSphinX Sep 04 '20 at 08:52
  • But in the end you'll have to choose. If you'd ask someone else, you'd probably get completely different answers. And you are the one that has to live with your decision. – TehSphinX Sep 04 '20 at 08:54
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/220960/discussion-between-hossein-nazari-and-tehsphinx). – Hossein Sep 04 '20 at 08:55

0 Answers0