I'm using MongoDB 2.6.1
I have a collection that stores the emails, project-wise. The documents are as follows(haven't included the 'Raw Email Text' key for readability) :
{
"_id" : ObjectId("540d4ae7eea013be22f1f0d6"),
"Project_Id" : "E11593",
"Project_Name" : "National Hearing Care- Novo",
"Email_Id" : "E11593.monitor@lntinfotech.com",
"Date" : "Mon Sep 08 05:05:35 IST 2014",
"To" : "manisha.bhopate@infostretch.com; ",
"From" : "Shubhangi Thorat",
"CC" : "NO VALUES",
"Subject" : "RE: pics",
"Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}
{
"_id" : ObjectId("540d4ae7eea013be22f1f0d7"),
"Project_Id" : "E11593",
"Project_Name" : "National Hearing Care- Novo",
"Email_Id" : "E11593.monitor@lntinfotech.com",
"Date" : "Mon Sep 08 05:02:38 IST 2014",
"To" : "manisha.bhopate@infostretch.com; ",
"From" : "Shubhangi Thorat",
"CC" : "NO VALUES",
"Subject" : "FW: pics",
"Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}
{
"_id" : ObjectId("540d4ae7eea013be22f1f0d8"),
"Project_Id" : "E11593",
"Project_Name" : "National Hearing Care- Novo",
"Email_Id" : "E11593.monitor@lntinfotech.com",
"Date" : "Mon Sep 08 04:37:47 IST 2014",
"To" : "Prachi Sutrawe; ",
"From" : "Mahindra Shambharkar",
"CC" : "NO VALUES",
"Subject" : "Accepted: Show and tell -Sale",
"Unique_Id" : "Mon-Sep-08-11:51:20-IST-2014"
}
I had the following thoughts on my mind when selecting the shard key:
- Build a compound index {Project_Id, _id} since Project_Id has a low cardinality but _id has a high one
- A hashed index on 'Date' / 'Unique_Id' which are both timestamps
- A hashed index on 'From' field but it's cardinality is dependent on the no. of people involved in the project
- 'To' and 'CC' are multivalue keys and 'Subject' has high randomness so not sure if these keys can be used at all
- While not listed in the output, 'Raw_Text' will be extensively read by different applications but I'm not sure if an index should be built and even used in sharding for this key !
What will be the optimal shard key in this case ?