I need to track the resources being used by the Vespa services so that I can increase the capacity of the cluster before the limit is reached to prevent any major downtime. And also to monitor if all the services are up or not.
The cluster where I wanted to implement this is a self-hosted Vespa cluster. Where one instance is the config node and the other 3 instances that I am using are content and container node.
The link to the documents that I had referred are mentioned below:
https://docs.vespa.ai/en/reference/metrics.html https://github.com/DataDog/integrations-extras/blob/master/vespa/metadata.csv
From what I understand
content.proton.resource_usage.disk.average
is for monitoring the relative amount of disk space used by the content node. And the limit of this is configured in services.xml as mentioned below.
<resource-limits>
<disk>0.78</disk>
That means if disk usage is greater than or equal to 0.78 of the total disk space available within the instance then document feeding will be blocked.
content.proton.resource_usage.memory.average
is for monitoring the relative amount of memory used by the content node. And the limit of this is configured in services.xml as mentioned below.
<resource-limits>
<memory>0.77</memory>
That means if memory usage is greater than or equal to 0.77 of the total memory available within the instance then document feeding will be blocked.
Firstly is the above understanding correct, if not what do the above parameters mean.
Also are there any other attributes that need to be monitored for all the services?
The API that I have found out is: http://localhost:8080/metrics/v2/values
. Is this the right API for all the above-mentioned requirements?