3

I'm working on Spring Boot v2.1.3.RELEASE & Spring Data Mongo. In this example, I want to apply uniqueness on email & deptName. The combination of email & deptName must be unique and is there any way to get email out since its repeating in each array object ?

I tried below, but it's not working !

@CompoundIndexes({
    @CompoundIndex(name = "email_deptName_idx", def = "{'email' : 1, 'technologyEmployeeRef.technologyCd' : 1}")
})

Sample Data

{
    "_id" : ObjectId("5ec507c72d8c2136245d35ce"),
    ....
    ....
    "firstName" : "John",
    "lastName" : "Doe",
    "email" : "john.doe@gmail.com",
    .....
    .....
    .....
    "technologyEmployeeRef" : [ 
        {
            "technologyCd" : "john.doe@gmail.com",
            "technologyName" : "Advisory",
            ....
            .....
            "Status" : "A"
        }, 
        {
           "technologyCd" : "john.doe@gmail.com",
           "technologyName" : "Tax",
           .....
           .....
           "Status" : "A"
       }
    ],
    "phoneCodes" : [ 
        "+352"
    ],
    ....
    ....
}

Technology.java

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@Document
public class Technology {
    @Indexed(name = "technologyCd", unique = true, sparse = true)
    private String technologyCd;

    @Indexed(name = "technologyName", unique = true, sparse = true)
    private String technologyName;
    private String status;
}

EmployeeTechnologyRef.java

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class EmployeeTechnologyRef {
    private String technologyCd;
    private String primaryTechnology;
    private String status;
}

Employee.java

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@Document
@CompoundIndexes({
    @CompoundIndex(name="emp_tech_indx", def = "{'employeeTechnologyRefs.primaryTechnology' : 1, 'employeeTechnologyRefs.technologyCd' : 1}" ,unique = true, sparse = true)
})
public class Employee {
    private String firstName;
    private String lastName;
    private String email;
    private List<EmployeeTechnologyRef> employeeTechnologyRefs;
}

I used below code but its not giving me any error of duplicate. How can we do this ?

Technology java8 = Technology.builder().technologyCd("Java").technologyName("Java8").status("A").build();
Technology spring = Technology.builder().technologyCd("Spring").technologyName("Spring Boot2").status("A").build();
List<Technology> technologies = new ArrayList<>();
technologies.add(java8);
technologies.add(spring);

technologyRepository.saveAll(technologies);

EmployeeTechnologyRef t1 = EmployeeTechnologyRef.builder().technologyCd("Java").primaryTechnology("Y")
        .status("A")
        .build();
EmployeeTechnologyRef t2 = EmployeeTechnologyRef.builder().technologyCd("Spring").primaryTechnology("Y")
        .status("A")
        .build();
List<EmployeeTechnologyRef> employeeTechnologyRefs = new ArrayList<>();
employeeTechnologyRefs.add(t1);
employeeTechnologyRefs.add(t2);
employeeTechnologyRefs.add(t1);

Employee employee = Employee.builder().firstName("John").lastName("Kerr").email("john.kerr@gmail.com")
        .employeeTechnologyRefs(employeeTechnologyRefs).build();
employeeRepository.save(employee);
PAA
  • 1
  • 46
  • 174
  • 282
  • There is some useful information related to unique indexes on fields in embedded documents: [How to set unique constraint for field in document nested in array?](https://stackoverflow.com/questions/61655391/how-to-set-unique-constraint-for-field-in-document-nested-in-array) – prasad_ May 22 '20 at 08:50
  • I already went through it and many other similar links does not solve my problems – PAA May 22 '20 at 09:46
  • @prasad_ - Could you please provide definitive answer on this query here ? As this doesn't solves my purposes – PAA May 24 '20 at 10:49
  • Your question could use some editing. It is unclear what you are asking. 1) The first part does not match the second part. First you are asking about email and deptName, then you are suddenly asking about primaryTechnology and technologyCd. Question would be more clear if the question dealt with consistent data. 2) What do you mean by "any way to get email out?" Are you asking about how to remove the field from existing data via some cleanup query? Are you concerned that if you remove it from the array that it may make it impossible to enforce uniqueness across the combination of the 2 fields? – jcarter May 25 '20 at 23:23
  • @jcarter - I apologized. I've corrected all the details – PAA May 26 '20 at 05:44

1 Answers1

4

In MongoDB, a unique index ensures that a particular value in a field is not present in more than one document. It will not guarantee that a value is unique across an array within a single document. This is explained here in the MongoDB Manual where it discusses unique multikey Indexes.

Thus, a unique index will not satisfy your requirement. It will prevent seperate documents from containing duplicate combinations, but it will still allow a single document to contain duplicate values across an array.

The best option you have is to change your data model so as to split the array of technologyEmployeeRef objects into separate documents. Splitting it up into separate documents will allow you to use a unique index to enforce uniqueness.

The particular implementation that should be taken for this data model change would depend upon your access pattern (which is out of the scope of this question).


One such way this could be done is to create a TechnologyEmployee collection that has all of the fields that currently exist in the technologyEmployeeRef array. Additionally, this TechnologyEmployee collection would have a field, such as email, which would allow you to associate it with a document in the Employee collection.

Sample Employee Document

{
  ....
  ....
  "firstName" : "John",
  "lastName" : "Doe",
  "email" : "john.doe@gmail.com",
  .....
  .....
  .....
}

Sample EmployeeTechnology Document

{
  "email" : "john.doe@gmail.com",
  "technologyCd" : "Java",
  "technologyName" : "Java8",
  ....
  .....
  "status" : "A"
}

Index in EmployeeTechnology collection

{'email' : 1, 'technologyCd' : 1}, {unique: true}

The disadvantage of this approach is that you would need to read from two collections to have all of the data. This drawback may not be a big deal if you rarely need to retrieve the data from both collections at the same time. If you do need all the data, it can be sped up through use of indexes. With the indexes, it could be furthered sped up through the use of covered queries.


Another option is to denormalize the data. You would do this by duplicating the Employee data that you need to access at the same time as the Technology data.

Sample Documents

[
  {
    ....
    "firstName" : "John",
    "lastName" : "Doe",
    "email" : "john.doe@gmail.com",
    .....
    "technologyCd" : "Java",
    "technologyName" : "Java8",
    ....
    "status" : "A"
  },
  {
    ....
    "firstName" : "John",
    "lastName" : "Doe",
    "email" : "john.doe@gmail.com",
    .....
    "technologyCd" : "Spring",
    "technologyName" : "Spring Boot2",
    ....
    "status" : "A"
  }
]

In this MongoDB blog post,they say that

You’d do this only for fields that are frequently read, get read much more often than they get updated, and where you don’t require strong consistency, since updating a denormalized value is slower, more expensive, and is not atomic.


Or as you've already mentioned, it may make sense to leave the data model as it is and to perform the check for uniqueness on the application side. This could likely give you the best read performance, but it does come with some disadvantages. First, it will slow down write operations because the application will need to run some checks before it can update the database.

It may be unlikely, but there is also a possibility that you could still end up with duplicates. If there are two back-to-back requests to insert the same EmployeeTechnology object into the array, then the validation of the second request may finish (and pass) before the first request has written to the database. I have seen a similar scenario myself with an application I worked on. Even though the application was checking for uniqueness, if a user double-clicked a submit button there would end up being duplicate entries in the database. In this case, disabling the button on the first click drastically reduced the risk. This small risk may be tolerable, depending on your requirements and the impact of having duplicate entries.


Which approach makes the most sense largely depends on your access pattern and requirements. Hope this helps.

jcarter
  • 416
  • 2
  • 5
  • I agree I've this options, but in order to bring the data I would need to keep calling three different collections and merge the data to create DTOs. I've decided to programatically handle if user trying to add technology that already exists for that Employee. Does it suites to Mongo modelling to maintain separate collection just to maintain uniqueness ? Thanks for your nice solution – PAA May 28 '20 at 05:44
  • Whether it makes sense to maintain separate collections really depends on the access pattern and requirements of your application. You may even be surprised at how quickly you can fetch the data. Also, I have added some other options to my answer. – jcarter May 28 '20 at 11:58
  • Agree ! I like that, it make sense – PAA May 28 '20 at 13:03