Which all metric to trace to determine if the resource needs to be added

Question

I need to track the resources being used by the Vespa services so that I can increase the capacity of the cluster before the limit is reached to prevent any major downtime. And also to monitor if all the services are up or not.

The cluster where I wanted to implement this is a self-hosted Vespa cluster. Where one instance is the config node and the other 3 instances that I am using are content and container node.

The link to the documents that I had referred are mentioned below:

https://docs.vespa.ai/en/reference/metrics.html https://github.com/DataDog/integrations-extras/blob/master/vespa/metadata.csv

From what I understand content.proton.resource_usage.disk.average is for monitoring the relative amount of disk space used by the content node. And the limit of this is configured in services.xml as mentioned below.

<resource-limits>
    <disk>0.78</disk>

That means if disk usage is greater than or equal to 0.78 of the total disk space available within the instance then document feeding will be blocked.

content.proton.resource_usage.memory.average is for monitoring the relative amount of memory used by the content node. And the limit of this is configured in services.xml as mentioned below.

<resource-limits>
    <memory>0.77</memory>

That means if memory usage is greater than or equal to 0.77 of the total memory available within the instance then document feeding will be blocked.

Firstly is the above understanding correct, if not what do the above parameters mean.

Also are there any other attributes that need to be monitored for all the services?

The API that I have found out is: http://localhost:8080/metrics/v2/values. Is this the right API for all the above-mentioned requirements?

score 1 · Answer 1 · answered Jun 17 '21 at 09:06

Yes, your understanding is correct.

The API that I have found out is: http://localhost:8080/metrics/v2/values. Is >this the right API for all the above-mentioned requirements?

Yes, but the port is 19092.

Also are there any other attributes that need to be monitored for all the >services?

There is no shortage of metrics in Vespa, the ones selected for the datadog integration is imho a good start.

score 0 · Answer 2 · answered Jun 17 '21 at 09:33

0

You have it right already. For reference these are the metrics used for autoscaling Vespa clusters: https://github.com/vespa-engine/vespa/blob/master/config-model/src/main/java/com/yahoo/vespa/model/admin/monitoring/AutoscalingMetrics.java

answered Jun 17 '21 at 09:33

Jon

2,043
11
9

Which all metric to trace to determine if the resource needs to be added

2 Answers2