1

Having a list of books that points to a list of authors, I want to display a tree, having in each node the author name and the number of books he wrote. Initially, I have embedded the authors[] array directly into books collection, and this worked like a charm, using the magic of aggregation framework. However, later on, I realise that it would be nice to have some additional information attached to each author (e.g. it's picture, biographical data, birth date, etc). For the first solution, this is bad because:

  • it duplicates the data (not a big deal, and yes, I know that mongo's purpose is to encapsulate full objects, but let's ignore that for now);
  • whenever an additional property is created or updated on the old records won't benefit from this change, unless I specifically query for some unique old property and update all the book authors with the new/updated values.

The next thing was to use the second collection, called authors, and each books document is referencing a list of author ids, like this:

{
    "_id" : ObjectId("58ed2a254374473fced950c1"),
    "authors" : [ 
        "58ed2a254d74s73fced950c1", 
        "58ed2a234374473fce3950c1"
    ],
    "title" : "Book title"
....
}

For getting the author details, I have two options:

  • make an additional query to get the data from the author collection;
  • use DBRefs.

Questions:

  1. Using DBRefs automatically loads the authors data into the book object, similar to what JPA @MannyToOne does for instance?
  2. Is it possible to get the number of written books for each author, without having to query for each author's book count? When the authors were embedded, I was able to aggregate the distinct author name's and also the number of book documents that he was present on. Is such query possible between two collections?

What would be your recommendation for implementing this behaviour? (I am using Spring Data)

hypercube
  • 958
  • 2
  • 16
  • 34
  • For 2 [lookup](https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/) will help – s7vr Apr 15 '17 at 11:24
  • I was able to use your suggestion, thank you very much! Please write an answer so that I can award you the accepted answer to my question. Also, you might wanna have a look at my solution. It involves two steps: step one which will use your $lookup suggestion and step two, an additional query to get the number of books with no authors, but that's perfectly fine! I was fearing that I might to to 500 queries for 500 authors, which was not acceptable. – hypercube Apr 15 '17 at 11:58

2 Answers2

2

You can try the below query in the spring mongo application.

UnwindOperation unwindAuthorIds = Aggregation.unwind("authorsIds", true);
LookupOperation lookupAuthor = Aggregation.lookup("authors_collection", "authorsIds", "_id", "ref");
UnwindOperation unwindRefs = Aggregation.unwind("ref", true);
GroupOperation groupByAuthor = Aggregation.group("ref.authorName").count().as("count");

Aggregation aggregation = Aggregation.newAggregation(unwindAuthorIds, lookupAuthor, unwindRefs, groupByAuthor);

List<BasicDBObject> results = mongoOperations.aggregate(aggregation, "book_collection", BasicDBObject.class).getMappedResults();
s7vr
  • 73,656
  • 11
  • 106
  • 127
  • As for mongo 3.4+, it seems that the local array unwind is no longer necessary, as shown here https://jira.mongodb.org/browse/SERVER-22881. The ref unwind was a nice trick though :). – hypercube Apr 18 '17 at 01:58
1

Following @Veeram's suggestion, I was able to write this query:

db.book_collection.aggregate([
    {
        $unwind: "$authorsIds"
    },
    {
        $lookup: {
            from: "authors_collection",
            localField: "authorsIds",
            foreignField: "_id",
            as: "ref"
        }
    },
    {$group: {_id: "$ref.authorName", count: {$sum: 1}}}
])

which returns something like this:

{
    "_id" : [ 
        "Paulo Coelho"
    ],
    "count" : 1
}

/* 2 */
{
    "_id" : [ 
        "Jules Verne"
    ],
    "count" : 2
}

This is exactly what I needed, and it sounds about right. I only need to do an additional query now to get the books with no author set.

hypercube
  • 958
  • 2
  • 16
  • 34
  • You don't need second query if you use `{ $unwind: { path: "$authorIds", preserveNullAndEmptyArrays: true } }` – s7vr Apr 15 '17 at 12:48