1

Our production MarkLogic DB is having 1.2 TB data divided among 6 forests. We plan to add 2 new forests to reduce stands per forest count.

Now, adding new forests starts rebalancing the data. That's okay, it takes time. But this rebalancing time keeps shooting up whenever merges start alongside rebalancing. Sometimes it takes, estimated 8 hours to suddenly to 16 hours. So, on average the whole process is taking approximately 24 hours.

My question is - If we disable the merge before adding the new forests and enable the manual merge soon after rebalancing completes( after adding forests), would the combined process be faster? And, will it be safe to do this?

  • An estimate of 8 to 16 hours does not add up to 24 hours in total. Keep in mind that the estimated remaining time is taking current load into account. It means that the estimate becomes larger when load increases, but also becomes less when load decreases. If the rebalance is estimated to take 8 hours with typical load, I'd expect it to take longer than that because of inevitable merges along the way, but less than 16h, as merges should be relatively short. Keep an eye on free space by the way. – grtjn Apr 02 '20 at 16:40
  • Thanks @grtjn. Yes, we had taken into account to avoid any sort of load. The above stats in question are from UAT where we took a full day downtime. And that 8 to 16 hours was just an example of time shooting up and down, which happened every time a merge appears while rebalancing (sometimes 8,9 merges took place simultaneously). Initially, when it started, it showed 16 hours estimation. Ultimately it completed in ~23 hours. Just checked, free space is ample. – akash kapil Apr 06 '20 at 12:45

2 Answers2

2

Anything that affects disk IO will affect the speed of rebalancing, including merging and standard database activity, however care should be taken if you are disabling merging.

The risk of disabling merging, is that you prevent the system from pruning stands, so if too many stands accumulate you may hit the hard limit, which will impact server operation.

If merging is having such a heavy impact, then you can look at tuning the merge configurations. More information can be found in the documentation.

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
Mike Gardner
  • 6,611
  • 5
  • 24
  • 34
  • 1
    You could also try setting a background-io-limit to help throttle merges: https://docs.marklogic.com/admin:group-set-background-io-limit – Mads Hansen Apr 02 '20 at 16:17
0

In addition to the other info provided, assignment policy may affect how much work is done. See for example: https://docs.marklogic.com/guide/admin/database-rebalancing#id_81616 . You can also set the rebalancer throttle to make it work slower if the system is getting overwhelmed. But if you turn off merging while rebalancing I'm going to bet you'll hit a TOOMANYSTANDS error pretty quickly since the small stands will need to be written because of the rebalancer, but won't be able to merge to larger+fewer stands.

asusu
  • 321
  • 1
  • 5