Can I set task.commit.ms to every 1ms?

Question

I have a project with Apache-Samza and I have a problem with duplicate data.

This is my checkpoint configuration :

task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
task.checkpoint.system=kafka
task.checkpoint.replication.factor=2
task.commit.ms=20000

On the documentation We can read this :

If task.checkpoint.factory is configured, this property determines how often a checkpoint is written. The value is the time between checkpoints, in milliseconds. The frequency of checkpointing affects failure recovery: if a container fails unexpectedly (e.g. due to crash or machine failure) and is restarted, it resumes processing at the last checkpoint. Any messages processed since the last checkpoint on the failed container are processed again. Checkpointing more frequently reduces the number of messages that may be processed twice, but also uses more resources.

So can I change task.commit.ms=20000 to 250ms or 1ms. It's good or very bad ? I have a very good cluster.

Why I need change this, because this Samza(worker) crash 1-3 time each week. And now the temporary solution is commit offset each time.

Documentation ref :

Appache-Samza

Apache-Samza-Configuration

Why does a program crash every week 1-3 times? Put lead around that computer — Bálint, Aug 09 '16 at 18:15
The problem it's a connection issue with server, so my cluster at US have x nodes and I have a samza(worker) connect to a another cluster to Europe. But the sysadmin He told me "I don't know where is the problem...." so for me it's very important I can fix right now the duplicate data. — MaximeF, Aug 09 '16 at 19:07
You should just set it to 100 ms. Backuping takes the calculation's time away — Bálint, Aug 09 '16 at 19:08

score 0 · Accepted Answer · answered Aug 09 '16 at 19:48

0

My solution I know it's not the solution for all problem. It's change the task.commit.ms to the same thing of task.shutdown.ms=5000.

Atlas-Samza-Configuration Shutdown

answered Aug 09 '16 at 19:48

MaximeF

4,913
4
37
51

Today I change task.commit.ms=5000 to 3000, it's better for me. – MaximeF Aug 10 '16 at 13:58

Can I set task.commit.ms to every 1ms?

1 Answers1