0

I'm using MongoDB 2.6.1

I have a collection that stores the emails, project-wise. The documents are as follows(haven't included the 'Raw Email Text' key for readability) :

{
        "_id" : ObjectId("540d4ae7eea013be22f1f0d6"),
        "Project_Id" : "E11593",
        "Project_Name" : "National Hearing Care- Novo",
        "Email_Id" : "E11593.monitor@lntinfotech.com",
        "Date" : "Mon Sep 08 05:05:35 IST 2014",
        "To" : "manisha.bhopate@infostretch.com; ",
        "From" : "Shubhangi Thorat",
        "CC" : "NO VALUES",
        "Subject" : "RE: pics",
        "Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}
{
        "_id" : ObjectId("540d4ae7eea013be22f1f0d7"),
        "Project_Id" : "E11593",
        "Project_Name" : "National Hearing Care- Novo",
        "Email_Id" : "E11593.monitor@lntinfotech.com",
        "Date" : "Mon Sep 08 05:02:38 IST 2014",
        "To" : "manisha.bhopate@infostretch.com; ",
        "From" : "Shubhangi Thorat",
        "CC" : "NO VALUES",
        "Subject" : "FW: pics",
        "Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}
{
        "_id" : ObjectId("540d4ae7eea013be22f1f0d8"),
        "Project_Id" : "E11593",
        "Project_Name" : "National Hearing Care- Novo",
        "Email_Id" : "E11593.monitor@lntinfotech.com",
        "Date" : "Mon Sep 08 04:37:47 IST 2014",
        "To" : "Prachi Sutrawe; ",
        "From" : "Mahindra Shambharkar",
        "CC" : "NO VALUES",
        "Subject" : "Accepted: Show and tell -Sale",
        "Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}

I had the following thoughts on my mind when selecting the shard key:

  1. Build a compound index {Project_Id, _id} since Project_Id has a low cardinality but _id has a high one
  2. A hashed index on 'Date' / 'Unique_Id' which are both timestamps
  3. A hashed index on 'From' field but it's cardinality is dependent on the no. of people involved in the project
  4. 'To' and 'CC' are multivalue keys and 'Subject' has high randomness so not sure if these keys can be used at all
  5. While not listed in the output, 'Raw_Text' will be extensively read by different applications but I'm not sure if an index should be built and even used in sharding for this key !

What will be the optimal shard key in this case ?

Kaliyug Antagonist
  • 3,512
  • 9
  • 51
  • 103

0 Answers0