Managing relationships with MongoDb in a Microservices architecture

Question

I've been working with microservices for some time now, always with relational databases. I am looking at MongoDb and I am not sure how to handle entity relationships involving different microservices. Here goes an example:

public class Employee implements Serializable {
   private String id;
   ...
}

public class Department implements Serializable {
    private String id;
    private String desc;
    private List<Employee> employees = new ArrayList<>();
    ...
}

These two entities are managed by two different microservices, with a one-to-many relationship managed by the Department entity. So far, so good.

With a relational database (being an optional relationship and with the possibility of one employee belonging to several departments) I'd map this in the Departments microservice with one table containing two fields: employee_id and department_id. When the client calls getDepartmentWithEmployees(depId) this microservice will read the table and obtain the proper employees from the Employees microservice.

But, in a MongoDb database, as far as I know, when I store a Department object it stores all associated Employees. Is not that duplicating the information? Is there a way, maybe, where MongoDb don't store all info about the employees but just their id? Or there is another answer?

I am pretty sure this is a very basic question, but I am new to all this stuff.

Thanks in advance.

`List employees` - this can be just `List employees`, where `EmployeeRef` object can be attributes like employee''s id and name (which is duplicated information, but mostly rarely changed). — prasad_, Mar 17 '21 at 15:45
Thanks for the tip, It seems a good aproach. I'll have a look at it. — didgewind, Mar 18 '21 at 11:02
In MongoDB Data accessed together should be stored together. There is no a strict rule about how to store employees and departments. You should implement in a way that best serves your needs. — Jonathan Orrego, Mar 20 '22 at 12:28

score 4 · Accepted Answer · answered Mar 17 '21 at 15:47

But, in a MongoDB database, as far as I know, when I store a Department object it stores all associated Employees. Is not that duplicating the information?

First of all, the statement above is not correct. From the MongoDB's perspective, whatever is provided as BSON is stored as it is. If you provide employees with the department then yes, it should. You can apply partial updates after creating the department... (e.g. using $set operator). But, I think the scope of your question is broader than this.

IMHO, creating nano-services for each document/table in the database is not a good approach. Especially, when the services only responsible for basic CRUD operation. You should first define your bounded contexts, aggragate roots and etc... In short, do not try to design tables before mapping business requirements to domain objects. What I'm trying to say is use DDD principles :)

These are the strategies that I found so far. When designing microservices you should also consider pros and cons of each strategy. (See bottom for references.)

General Principles of Mapping Relational Databases to NoSQL

1:1 Relationship
- Embedding
- Link with Foreign Key
1:M Relationship
- Embedding
- Linking with Foreign Key
- (Hybrid) Bucketing Strategy
N:M Relationship
- Two-Way Referencing
- One-Way Referencing

1:1 Relationship

The 1:1 relation can be mapped in two ways;

Embed the relationship as a document
Link to a document in a separate collection

Tables:

// Employee document
{
   "id": 123,
   "Name":"John Doe"
}

// Address document
{
   "City":"Ankara",
   "Street":"Genclik Street",
   "Nr":10
}

Example: Embedding (1:1)

Advantage: Address can be retrieved with a single read operation.


{
  "id": 123,
  "Name":"John Doe",
  "Address": {
    "City":"Ankara",
    "Street":"Genclik Street",
    "Nr":10
  } 
}

Example: Link with foreign key (1:1)

{
   "id": 763541685,  // link this
   "Name":"John Doe"
}

Address with document key;

{
   "employee_id": 763541685,
   "City":"Ankara",
   "Street":"Genclik street",
   "Nr":10
}

1:M Relationship

Initial:

// Department collection
{
  "id": 1,
  "deparment_name": "Software",
  "department_location": "Amsterdam"
}

/// Employee collection
[
    {
      "employee_id": 46515,
      "employee_name": "John Doe"
    },
    {
      "employee_id": 81584,
      "employee_name": "John Wick"
    }
]

Example: Embedding (1:M)

Warning:

Employee list might be huge!
Be careful when using this approach in write-heavy system. IO load would increase due to housekeeping operations such indexing, replicating etc.
Pagination on employees is hard!!!

{
  "id": 1,
  "deparment_name": "Software",
  "department_location": "Amsterdam",
  "employess": [
                   {
                     "employee_id": 46515,
                     "employee_name": "John Doe"
                   },
                   {
                     "employee_id": 81584,
                     "employee_name": "John Wick"
                   }
               ]
}

Example: Linking (1:M)

We can link department_id from employee document.

Advantage: Easier pagination
Disadvantage: Retrieve all employees that are belong to department X. This query will need a lot of read operations!

[
    {
      "employee_id": 46515,
      "employee_name": "John Doe",
      "department_id": 1
    },
    {
      "employee_id": 81584,
      "employee_name": "John Wick",
      "department_id": 1
    }
]

Example: Bucketing Strategy (Hybrid 1:M)

Useful for cases like time series.
Hybrid = Embedding + Linking
Advantage: Single read to fetch 100 employees at a time, allowing for efficient pagination.
See Building with Patterns: The Bucket Pattern

We'll split the employees into buckets with maximum of 100 employees in each bucket.

{
    "id":1,
    "Page":1,
    "Count":100,
    "Employees":[
        {
            "employee_id": 46515,
            "employee_name": "John Doe"
        },
        {
            "employee_id": 81584,
            "employee_name": "John Wick"
        }
    ]
}

N:M Relationship

To choose Two Way Embedding or One Way Embedding, the user must establish the maximum size of N and the size of M.
For example; if N is a maximum 3 categories for a book and M is a maximum of 5,000,000 books in a category you should pick One Way Embedding.
If N is a maximum 3 and M is a maximum of 5 then Two Way Embedding might work well. schema basics

Example: Two-Way Referencing (N:M)

In Two Way Embedding we will include the Book foreign keys under the book field in the author document.

Author collection

[
    {
       "id":1,
       "Name":"John Doe",
       "Books":[ 1, 2 ]
    },{
       "id":2,
       "Name": "John Wick",
       "Books": [ 2 ]
    }
]

Book collection:

[
    {
       "id": 1,
       "title": "Brave New World",
       "authors": [ 1 ]
    },{
       "id":2,
       "title": "Dune",
       "authors": [ 1, 2 ]
    }
]

Example: One-Way Referencing (N:M)

Example Books and Categories: The case is that several books belong to a few categories but a couple categories can have many books.

Advantage: Optimize the read performance
The reason for choosing to embed all the references to categories in the books is due to the fact that being lot more books in a category than categories in a book.

Catergory

[
  {
    "id": 1,
    "category_name": "Brave New World"
  },
  {
    "id": 2,
    "category_name": "Dune"
  }
]

An example of a Book document with foreign keys for Categories

[
    {
      "id": 1,
      "title": "Brave New World",
      "categories": [ 1, 2 ],
      "authors": [ 1 ] 
    },
    {
      "id": 2,
      "title": "Dune",
      "categories": [ 1],
      "authors": [ 1, 2 ] 
    }
]

References

Thank you, this is a lot of valuable information. Still I cannot grasp if there is an optimal mapping for these kind of relations when working with different microservices (or nanoservices, let's see it as a general question). Should we store just an `EmployeeRef` object, like @prasad_ was suggesting? Or is there other approaches? — didgewind, Mar 18 '21 at 11:08
So I wrote down the general concepts of mapping relations in NoSQL databases. Unfortunately, there is no silver bullet when mapping relations. It really depends on how you save/query the database. It seems @prasad_ points out **linking strategy** instead of dumping out the whole `Employee` instance. That's perfectly fine since its the one of the strategy. For the very same purpose, there is an even better way in Spring's MongoDB client implementation which is `@DBRef`. [Using @DBRefs](https://docs.spring.io/spring-data/mongodb/docs/current/reference/html/#mapping-usage-references) — Ali Can, Mar 18 '21 at 16:18