I have a user collection with 52 million records. Each user documents has a list of comments and comment_id
has a unique index on it.
{
_id:123,
user_name:"xyz",
comments:[
{
comment_id:123,
text:"sd"
},
{
comment_id:234,
text:"sdf"
}
......,
(63000 elements)
]
}
The totalIndexSize
of the comment_id
index is 104GB. I have around 100 documents out of 52M which have 63000 elements in comments array.
My aim is to delete the old comments and reduce the size of the comments array by over 80%. Earlier when I tried to update the document using this query
db.user.updateOne({_id:_id},{$set: {"comments":newCommentsArray}},upsert=True)
here the newCommentsArray will be of size around 400. This operation took around 130 sec to execute.
My Questions are:
1) What could be the reason update query above took 130sec. Is it because of huge unique index size on comment_id
field? (I believe that updating the comments array with new comments array will try to rearrange the index for all the deleted 63000 elements and insert the new elements in the index.)
2) I had an other approach use $pull
which is basically pulling 100 comments from the comments array and waiting for 5 sec and then execute for next batch of 100 comments. What do you think of this solution.
3) If the above solution is no good can you suggest a good way to reduce the comments array by over 80%.