1

I have a question regarding how far can we push the limit of a content node.

My setup is one machine being stateless, the other being a content node. I noticed that when pushing a lot of documents (around 50k characters), the node will fail around 80 Millions docs, which is about 1 Terabyte of data.

The content node has 4Tb of storage, for 115Go of memory. I do not save anything as attribute, only summary and index.

The thing is that I can't properly manage to identify what is the cause of the failure of the content node, for example, which metrics to look at to identify the problem.

I thoroughly read the sizing documentation, but I did not found my answer. Maybe do you have some hints on where to look ?

Robin
  • 46
  • 2

2 Answers2

2

Did you check the vespa.log file on the content node? You might get some hints there.

Also, depending on your system configuration, you might be running out of file descriptors on the content node.

  • I did check on the log, nothing that stood out as the reason of the failure. The file descriptor may be a good hint, I'll have to refeed that to see if that might be the case. – Robin Mar 28 '18 at 08:31
2

Could you please define "the node will fail"? How does it fail? If you manage to run out of memory the OOM killer might be coming for your proton-bin process (https://linux-mm.org/OOM_Killer). What is the resource utilization prior to the failure?

Jo Kristian Bergum
  • 2,984
  • 5
  • 8