I am using GridFs to save millions of files. When getting a list of all files, large result set causes Mongo to fail. When using python, I deal with this situation through using an empty filter for find():
client = MongoClient("192.168.0.13")
db = client.test
fs = gridfs.GridFS(db)
for f in fs.find():
#..relevant python code
This approach works because I get a cursor from .find()
With Scala and Casbah, I could not find a way of doing this. No matter what I do, Mongo tries to perform some operation on the result set, exceeding its memory limits assigned to whatever particular operation it is performing. My scala test code is:
val mongoClient = MongoClient("192.168.0.13")
val db = mongoClient("test")
val gridfs = GridFS(db)
for(f <- gridfs) println(f.filename)
Running this code leads to :
Exception in thread "main" com.mongodb.MongoException: Runner error: Overflow sort stage buffered data usage of 33554552 bytes exceeds internal limit of 33554432 bytes
I just could not manage to obtain a cursor from Casbah for GridFs access. How do I do it?