I'm developing a web application with NodeJS, MongoDB and Mongoose. It is intended to act as an interface between the user and a big data environment. The idea is that the users can execute the big data processes in a separated cluster, and the results are stored in a MongoDB collection Results
. This collection may store more than 1 million of documents per user.
The document schema of this collection can be completely different between users. For instance, we have user1
and user2
. Examples of document in the Results
collection for user1
and user2
:
{
user: ObjectId(user1):, // reference to user1 in the Users collection
inputFields: {variable1: 3, ...},
outputFields: { result1: 504.75 , ...}
}
{
user: ObjectId(user2):,
inputFields: {country: US, ...},
outputFields: { cost: 14354.45, ...}
}
I'm implementing a search engine in the web application so that each user can filter in the fields according to the schemas of their documents (for example, user1
must me able to filter by inputFields.variable1
, and user2
by outputFields.cost
). Of course I know that I must use indexes, otherwise the queries are so slow.
My first attempt was to create an index for each different field in the Results
collection, but it's quite inefficient, since the database server becomes unstable because of the size of the indexes. So my second attempt was to try to reduce the amount of indexes by using partial indexes, so that I create indexes specifying the user id in the option partialFilterExpression
.
The problem is that if another user has the same schema in the Results
collection as any other user and I try to create the indexes for this user, MongoDB throws this exception:
Index with pattern: { inputFields.country: 1 } already exists with different options
It happens because the partial indexes cannot index the same fields even though the partialFilterExpression
is different.
So my questions are: How could I allow the users to query their results efficiently in this environmnet? Is MongoDB really suitable for this use case?
Thanks