Cassandra 'nodetool repair -pr' taking way too much time

Question

I am running a cluster with 1 datacenter (10 nodes) and Cassandra 2.1.7 installed on each. We are using SimpleStretegy (old mistake).

The situation is, I have not run any nodetool repair since begining, and now there is data of about 200 GB with 3 RF.

As running full repair or incremental repair is same at this point. So I have tried to run full repair. But this result in coordinator node down.

So I end up running full partition ranges repair (nodetool repair -pr) on each node one at a time. But this is taking way too much time (15+ hrs for each node, hence weeks for all nodes).

Am I doing this wrong, or this is supposed to happen? Or this is a version problem?

In future, if I run full repair again after finishing this, would this take weeks as well?

score 2 · Accepted Answer · answered Apr 03 '17 at 09:22

2

Since full repair is mainly affected by data size, it should take same amount of time.

I suggest moving to incremental repairs, this should save your time and resources.

Here's a article about how to do this in 2.1: https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesMigration.html

answered Apr 03 '17 at 09:22

nevsv

2,448
1
14
21

Would incremental repair be sufficient for overall health of the cluster? Can this eliminate the need for full repair? – r005t3r Apr 05 '17 at 09:16
Sure, that's what it suppose to solve. It also a default type of repair in later versions. – nevsv Apr 05 '17 at 09:20

score 1 · Answer 2 · answered May 19 '17 at 07:35

1

If your date size too big, you can use Sub-range repair, it's smiliar to repair pr but it's focus in sub range.

For more explain : https://www.pythian.com/blog/effective-anti-entropy-repair-cassandra

answered May 19 '17 at 07:35

V.HL

80
6

Cassandra 'nodetool repair -pr' taking way too much time

2 Answers2