0

I need help because I don't know why my aggregate is very slow. I have one database with 2 collections :

users (id, name, email) with 40 000 elements

posts (id, author, content) with 39 000 elements

I want to know all post posted by an user. I use casbah and this is my code:

val aggregationOptions = AggregationOptions(AggregationOptions.CURSOR)
val content_return = 
MongoClient("localhost", 27017)("Blog")("users").aggregate(
List(MongoDBObject("$lookup" -> MongoDBObject("from" -> "posts", "localField" -> "name", "foreignField" -> "author", "as" -> "posts")),
     MongoDBObject("$unwind" -> "$posts")),
     aggregationOptions)

If anyone have an idea, thank you very much !

Ned Z
  • 9
  • 1
  • 2
    do you have an index on the posts collection for the author field? if you don't the $lookup will run a full table scan on 39.000 posts for each of the 40.000 users. – Pedro Goes May 06 '18 at 21:02
  • 3
    More to the point, **nobody** has a practical use for a "join" in **all** items in two collections. Realistically you will have a "filter" to select only "some" of the actual entries needed, and you need to make this happen "before" the `$lookup` and certainly the `$unwind`. If you have real query conditions to apply then please explain them within your question. But the two operations with no filter is expected to have a very high cost. – Neil Lunn May 06 '18 at 21:18
  • Pedro Goes, I have the basic index "_id" yeah. Neil Lunn, without filter it takes 380s I think it is very long no ? – Ned Z May 06 '18 at 22:50
  • @NedZ Do the maths. This query causes 40k lookups to happen. That's 9,5 milliseconds per lookup (actually a bit less, taking the overhead of the aggregation into account). Blazing fast in my book. The problem is that you apply the SQL thought model to MongoDB. MongoDB is **no SQL** database. So modelling your data like it was will get you into trouble. – Markus W Mahlberg May 07 '18 at 07:14

0 Answers0