3

Setup Scala 2.11.4, Playframework 2.3.7, Reacivemongo (0.10.5.0.akka23/0.11.0-SNAPSHOT tried with both).

We have a collection with 18'000 entities, processing this collection in asynchronous manner, using Enumerator/Iteratee approach.

Case 1. Processing is simple (extraction of entities to CSV format and sending them in chunks as a REST response) everything works fine, all records are extracted and processed.

Case 2. Processing involves calculations which take up to 10 seconds, and updating of records after calculation, calculation is done with foreach Iteratee, which updates number of processed entities in internal task tracker. The processing might take a while, but that's OK

        Patient.findByClient(clientName) &>
            Enumeratee.mapM(patient => {
                val evaluatedAndSaveTask = patient.
                    evaluate(parser).
                    flatMap(patientOpt =>
                        patientOpt.
                            map(evaluatedPatient => evaluatedPatient.saveAndGet().map(Some(_))).
                            getOrElse(Future.successful(None))
                    )
                evaluatedAndSaveTask.recover({
                    case t =>
                        t.printStackTrace()
                        None
                })
            })
        // Step 2.1. Running evaluation process through Iteratee
        val evaluationTask = evaluation run Iteratee.foreach(patientOpt => {
            collection.update(Json.obj("clientName" -> clientName), Json.obj("$inc" -> Json.obj("processedPatients" -> 1))))
        )
        // Step 2.3. Log errors
        evaluationTask.onSuccess({ case _ => Patient.LOG.info("PatientEvaluation DONE") })
        evaluationTask.onFailure({ case t => {
            t.printStackTrace();
            Patient.LOG.info("PatientEvaluation FAILED");
        }})

In this case only 575 entities get's processed, and Iteratee ends printing out "Patient evaluation DONE".

I removed save from the equation, and it did not help.

Why can that be?

mavarazy
  • 7,562
  • 1
  • 34
  • 60

1 Answers1

3

I finally found a culprit of the problem - Mongo automatically expires evaluation after some timeout, you can specify noCursorTimeout flag, to prevent this:

        collection.
            find(findQ).
            sort(if(sortQF.values.isEmpty) sortQ else sortQF).
            options(QueryOpts(skipN = offset + page._1 * page._2).noCursorTimeout).
            cursor[T].

For some reason, ReactiveMongo does not throw Exception in this case, and just closes Iterator. I created an Issue in ReactiveMongo https://github.com/ReactiveMongo/ReactiveMongo/issues/250, following this.

For now, for me it might be safer to expire cursor, and restart with offset.

mavarazy
  • 7,562
  • 1
  • 34
  • 60
  • That is correct. Mongo only retains cursors for 10 minutes after which they expire which is why your case 1 was getting exhausted after 575 entries. Note that 10 minutes = 600 seconds so you were roughly processing those calculations in just over 1 second, not 10 seconds (on average). – Reid Spencer Jan 20 '15 at 14:57
  • It might be that the last batch took 10 min, while others were faster, that is why it came to 30 min – mavarazy Jan 20 '15 at 15:01
  • Does Reactivemongo closes the cursor automatically? With this option mongo will never closes the cursor automatically. – Amir Karimi May 08 '16 at 08:41
  • It should on enumeration exhaustion, but I've not tested this :) – mavarazy May 08 '16 at 17:05