So I would like to de-dup my dataset which has 2 billion records in. I have an index on url, and I want to iterate through each record and see if it's a duplicate.
The index is 110GB
MongoDB.Driver.MongoCommandException: 'Command find failed: Executor error during find command :: caused by :: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit..'
My current method won't run because of the Index being huge.
var filter = Builders<Page>.Filter.Empty;
var sort = Builders<Page>.Sort.Ascending("url");
await collection.Find(filter).Sort(sort)
.ForEachAsync(async document =>
{
Console.WriteLine(document.Url);
//_ = await collection.DeleteOneAsync(a => a.Id == document.Id);
}
);