Querying MongoDB collection with heterogeneous schema efficiently

Question

I'm developing a web application with NodeJS, MongoDB and Mongoose. It is intended to act as an interface between the user and a big data environment. The idea is that the users can execute the big data processes in a separated cluster, and the results are stored in a MongoDB collection Results. This collection may store more than 1 million of documents per user.

The document schema of this collection can be completely different between users. For instance, we have user1 and user2. Examples of document in the Resultscollection for user1 and user2:

{
 user: ObjectId(user1):, // reference to user1 in the Users collection
 inputFields: {variable1: 3, ...},
 outputFields: { result1: 504.75 , ...}
}

{
 user: ObjectId(user2):,
 inputFields: {country: US, ...},
 outputFields: { cost: 14354.45, ...}
}

I'm implementing a search engine in the web application so that each user can filter in the fields according to the schemas of their documents (for example, user1 must me able to filter by inputFields.variable1, and user2 by outputFields.cost). Of course I know that I must use indexes, otherwise the queries are so slow.

My first attempt was to create an index for each different field in the Results collection, but it's quite inefficient, since the database server becomes unstable because of the size of the indexes. So my second attempt was to try to reduce the amount of indexes by using partial indexes, so that I create indexes specifying the user id in the option partialFilterExpression.

The problem is that if another user has the same schema in the Results collection as any other user and I try to create the indexes for this user, MongoDB throws this exception:

Index with pattern: { inputFields.country: 1 } already exists with different options

It happens because the partial indexes cannot index the same fields even though the partialFilterExpression is different.

So my questions are: How could I allow the users to query their results efficiently in this environmnet? Is MongoDB really suitable for this use case?

Thanks

@AlexBlex I'm not sure if that is a good solution... Anyways, I'm going to check it. Thanks for your suggestion — Álvaro Valencia, Apr 05 '18 at 13:44

Querying MongoDB collection with heterogeneous schema efficiently

0 Answers0