Fiware: Data loss prevention

Question

I’m working with the 0.27.0 version of context broker. I'm using the Cygnus generic enabler and I have established a MQTT agent that connects external devices to the context broker.

My major concern right now is how to prevent from data loss. I established the context broker and the Cygnus mongodb databases as replica sets, but that won't ensure that all data will be persisted into the databases. I have seen that Cygnus uses Apache flume. Looking at its configuration, the re-injection retries can be configured:

# Number of channel re-injection retries before a Flume event is definitely discarded (-1 means infinite retries) 
cygnusagent.sources.http-source.handler.events_ttl = -1

¿It is a good idea to establish the retries value to -1? I have read about events re-injected in the channel forever. ¿What can be done to ensure that all the data will be persisted? ¿Is there any functionality into fiware ecosystem oriented to that purpose?

score 0 · Answer 1 · answered Feb 19 '16 at 11:26

Regarding Cygnus, the TTL is for sure the way of controlling the persistence retries after an error. A retry means the data is reinjected in the internal channel communicating the source (which receives Orion notifications) and the sink (which persists the data in the final storage) for future persistence attempts.

Possible values for this TTL are:

TTL = 0: there are no retries, i.e. if the first time a notified data cannot be persisted in the final storage (because of a network fail, a storage error, whatever) then the data is dropped.
TTL > 0: there are as much retries as configured TTL. Once exhausted the TTL the data is dropped.
TTL = -1: infinite retries, i.e. the data is reinjected in the channel forever until it is persisted or the channel gets full.

As commented, a -1 TTL may consume the channel capacity if the final storage never gets OK, avoiding new received data is put into the channel. Nevertheless, if the final storage never gets OK, such a drawback does not matter, right? :)

Thus, we could say the rules for choosing a TTL are:

If you don't want retries, simply configure 0.
If you want retries but you don't mind to loose data afeter certain number of retries, then configure a positive value.
If you want retries but you don't want to loose data, then configure -1 and a large channel capacity since the final storage may be down for an unknown time.

In any case, the TTL feature is changing during this sprint. The behaviour will be the same, but instead of being applied to single events, it will applied to batches of events (batches may be about 1 single event, of course). You'll see this change in the next release of Cygnus (0.13.0), and it will be available at the end of February 2016 (at the moment of writing this, the next week :)). My recommendation is to wait for such a release if you want to instensively use the TTL feature.

if I understand it correctly, I think that this new release won't fix our current issue. Right now, if the replica set's primary node turns out of service, it will take ten seconds to establish a new one as a primary node. All the events arriving into that lapse of time will get a rejection from de database and will be discarded, instead being re-injected into the channel, right? — Julen, Feb 24 '16 at 10:58
Not necessarily. Nothing will be discarded if, for instance, yo configure a TTL of -1. Or if you configure a TTL large enough to make the events are still in the channel, ready for a persistence retry, when the MongoDB replica set is up again. — frb, Feb 26 '16 at 06:37
Right now I have configured a TTL of -1 and I have increased the channel capacity to 10.000. Even so, during the 10 seconds that last the arbitration for a new primary node all executed operations are lost. Seems that they receive a database error and aren't re-injected again. — Julen, Feb 29 '16 at 09:34
In addition, using the same conf (event_TTL: -1, channel_capacity: 10000) I have seen that if several entities are created leaving a gap of time lower to one second between them, many of them won't be stored on cygnus. I tryed too with last cygnus version (0.13) and with batch_TTL to -1. But I still keep losing data. ¿Any ideas? — Julen, Mar 11 '16 at 10:43

Fiware: Data loss prevention

1 Answers1