3

I'm trying to delete a batch of couchbase documents in rapid fashion according to some constraint (or update the document if the constraint isn't satisfied). Each deletion is dubbed a "parcel" according to my terminology.

When executing, I run into a very strange behavior - the thread in charge of this task starts working as expected for a few iterations (at best). After this "grace period", couchbase gets "stuck" and the Observable doesn't call any of its Subscriber's methods (onNext, onComplete, onError) within the defined period of 30 seconds.

When the latch timeout occurs (see implementation below), the method returns but the Observable keeps executing (I noticed that when it kept printing debug messages when stopped with a breakpoint outside the scope of this method). I suspect couchbase is stuck because after a few seconds, many Observables are left in some kind of a "ghost" state - alive and reporting to their Subscriber, which in turn have nothing to do because the method in which they were created has already finished, eventually leading to java.lang.OutOfMemoryError: GC overhead limit exceeded.

I don't know if what I claim here makes sense, but I can't think of another reason for this behavior. How should I properly terminate an Observable upon timeout? Should I? Any other way around?

public List<InfoParcel> upsertParcels(final Collection<InfoParcel> parcels) {
    final CountDownLatch latch = new CountDownLatch(parcels.size());

    final List<JsonDocument> docRetList = new LinkedList<JsonDocument>();
    Observable<JsonDocument> obs = Observable
            .from(parcels)
            .flatMap(parcel ->
                        Observable.defer(() -> 
                            {
                                return bucket.async().get(parcel.key).firstOrDefault(null);
                            })
                            .map(doc -> {
                                // In-memory manipulation of the document
                                return updateDocs(doc, parcel);
                            })
                            .flatMap(doc -> {
                                boolean shouldDelete = ... // Decide by inner logic
                                if (shouldDelete) {
                                    if (doc.cas() == 0) {
                                        return Observable.just(doc);
                                    }
                                    return bucket.async().remove(doc);
                                }
                                return (doc.cas() == 0 ? bucket.async().insert(doc) : bucket.async().replace(doc));
                            })
            );

    obs.subscribe(new Subscriber<JsonDocument>() {
                @Override
                public void onNext(JsonDocument doc) {
                    docRetList.add(doc);
                    latch.countDown();
                }

                @Override
                public void onCompleted() {
                    // Due to a bug in RxJava, onError() / retryWhen() does not intercept exceptions thrown from within the map/flatMap methods.
                    // Therefore, we need to recalculate the "conflicted" parcels and send them for update again. 
                    while(latch.getCount() > 0) {
                        latch.countDown();
                    }
                }

                @Override
                public void onError(Throwable e) {
                    // Same reason as above
                    while (latch.getCount() > 0) {
                        latch.countDown();
                    }
                }
            };
    );

    latch.await(30, TimeUnit.SECONDS);

    // Recalculating remaining failed parcels and returning them for another cycle of this method (there's a loop outside)
}
KidCrippler
  • 1,633
  • 2
  • 19
  • 34
  • I can't speak for the Couchbase part but if you think you encountered a bug or performance deficiency in RxJava, please come to the RxJava issue list and post a small code snipplet that helps us resolve any issues. – akarnokd Dec 28 '15 at 14:49
  • 10x for being responsive. A few months back, when I found about this deficiency, I did come to the issue list and found out that there was already an issue opened for that, which also included a fix (in the form of a code snippet). I don't know if this fix has been tested/published, but the team already knows about it. – KidCrippler Dec 28 '15 at 14:59

2 Answers2

0

I think this is indeed due to the fact that using a countdown latch doesn't signal the source that the flow of data processing should stop.

You could use more of rxjava, by using toList().timeout(30, TimeUnit.SECONDS).toBlocking().single() instead of collecting in an (un synchronized and thus unsafe) external list and of using the countdownLatch.

This will block until a List of your documents is returned.

Simon Baslé
  • 27,105
  • 5
  • 69
  • 70
  • We used this approach before and had to transform all our code to be latch-based because of performance problems related to BlockingObservable. I'm pretty sure I read in the coucuhbase documentation that BlockingObservables are not recommended for production environments. BTW, my list is local per thread, why is it unsafe? – KidCrippler Dec 28 '15 at 14:20
  • 1
    It can be unsafe because the adding to the list is done asynchronously, most probably from within another thread. If you want to keep the latch, have you tried keeping the result of subscribe (a Subscription) and call unsubscribe on it when the latch times out? – Simon Baslé Dec 28 '15 at 15:28
  • It's actually a good idea which I've tried. I synchronized the access to the list (within the onNext() method) and unsubscribed after the latch timeout. Unfortunately, I still experience the same problem. – KidCrippler Dec 28 '15 at 16:46
0

When you create your couchbase env in code, set computationPoolSize to something large. When the Couchbase clients runs out of threads using async it just stops working, and wont ever call the callback.