2

according to mongoDB documentation, it's not recommended to create multikey index for large arrays, so what is the alternative option for that?

I want to notify my app users whenever one of their contacts also start using the app, so I have to upload and manage the contacts list of each user. we are using mongoDB with replica set of master with two secondaries machines. does mongo can handle multikey indexing for array with hundreds of values? hundreds of contacts for hundreds thousands of users can be very hard to mange.

the multikey solution looks like that:

{
  customerId: "id1",
  contacts: ["aaa", "aab", "aac", .... "zzz"]
}

index: createIndex({ contacts: 1 }).

another solution is to save each contacts in it's own document and save all the app users that related to him:

{
  phone: "aaa",
  contacts: ["id1", "id2", "id3"]
},
{
  phone: "aab",
  contacts: ["id1"]
},
{
  phone: "aac",
  contacts: ["id1"]
},
......
{
  phone: "zzz",
  contacts: ["id1"]
}
index: createIndex( { phone: 1 } )

both have poor performance on writing when uploading the contacts list:
the first on calculate huge index, and the second for updating lots of documents concurrent.
Is there a better way to do it?
I'm using a replica set with two secondaries machines, does shard key could help?

Thanks

zigy
  • 69
  • 1
  • 4

1 Answers1

1

To index a field that holds an array value, MongoDB creates an index key for each element in the array. These multikey indexes support efficient queries against array fields. So if i were you, my data model would be like this :

{
    customerId: "id1",
    contacts: ["_idx", "_idy", "_idw", .... "_idz"]
}

And then create your index on the contacts. MongoDB creates by default indexes on ids. So you will have to create new documents for the non app users, just try to to add a field, like "app_user" : true/false.

For index performance, you could make it build in the background without any issues, and for replica sets, this is how it's done.

For the sharding, it won't help you, because you won't even be able to shard anything, since you have one primary node in your cluster. Sharding needs at least 2 sets of primary Mongo instances, so in your case, you could add a fourth server, then have two replica sets, of one primary and one secondary, then shard them, and tranform your system into 2 replicated shards.

Once this is achieved, it will obviously balance the loads between the 2 shards, eventhough a hundred documents isn't really much to deal with for MongoDB.

On the other hand if you're going to go for sharding, you will need more setup, for config servers if you're using Mongodb 3.4 or higher.

MrRobot
  • 483
  • 1
  • 7
  • 18
  • Thanks for replying. most likely that majority of the contacts that each user upload, aren't app users, so they don't have _id. they will have if I'll save each contact as a documents like in the second option I wrote. I just want the clarify again, right now I have more then 300k users, if they all upload ~800 contacts, it will decrease performance to index them all, or to write ~800 new documents for each upload. – zigy Mar 18 '18 at 07:51
  • 800 documents, is close to nothing for MongoDB so it won't generate any performance issues write-wise. On the other hand, if all users upload at the same time, that will cause a bit of a problem, it could be solved if you were to enable sharding. Most likely aswell, there will be contacts in common between those 300k users, so try to rely on phone numbers, since they are unique, to avoid duplicates. – MrRobot Mar 18 '18 at 09:22
  • @zigy, could you mark it as the answer if it does indeed answer your question ? – MrRobot Mar 19 '18 at 22:04
  • the background build for replica set is for creating the index for populated collections. my problem is indexing for each new upload. index array of ~1000 phone numbers for each new upload still harm the performance and the cpu usage spikes to max, even with {background: true}. next week I'll gonna measure the other solution, create document for each new contact that will manage all the customers related to him. I'll keep update. Thanks! – zigy Mar 22 '18 at 10:25