2

Basically I need to monitor vespa metrics and for that I am trying to implement method to push metrics to cloudwatch.

This is the document that I am referring to https://docs.vespa.ai/documentation/monitoring.html

I have added the credentials file and putMetricData permission in the IAM role attached. The service.xml file that I am using in my code looks like this:

      <admin version="2.0">
          <adminserver hostalias="admin0"/>
          <configservers>
              <configserver hostalias="admin0"/>
          </configservers>
          <monitoring>
          </monitoring>
          <metrics>
              <consumer id="my-cloudwatch">
                  <metric-set id="vespa" />
                  <cloudwatch region="ap-south-1" namespace="vespa">
                      <shared-credentials file="~/.aws/credentials" profile="default" />
                  </cloudwatch>
              </consumer>
          </metrics>
  </admin> 

I have deployed the code using vespa-deploy prepare application.zip && vespa-deploy activatebut I am still not seeing any metrics updated on my cloudwatch.

Also, I have tried to add:

<monitoring>
  <interval>1</interval>
  <systemname>vespa</systemname>
</monitoring>

But getting this error when deploying:

Request failed. HTTP status code: 400
Invalid application package: default.default: Error loading model: XML error in services.xml: element "interval" not allowed here; expected the element end-tag [9:16], input:

How can I fix this issue. Or atleast debug the issue that I am facing.

Yash Kasat
  • 203
  • 1
  • 5

1 Answers1

1

I suggest to use absolute path to the credentials file, as the ~ may not resolve to the directory you intended at runtime.

A couple more things:

  • I recommend using the default metric set, as vespa contains a lot of metrics, which will drive your CloudWatch cost higher. If you need additional metrics, you can add them with the metric tag inside consumer.
  • The monitoring element doesn't do anything useful in this context, so you should just drop it.

If you still don't see any metrics, please check for warnings or errors in the vespa log file (use vespa-logfmt) and the Telegraf log file: /opt/vespa/logs/telegraf/telegraf.log. (Vespa uses Telegraf internally to emit metrics to CloudWatch.)

gjoranv
  • 4,376
  • 3
  • 21
  • 37
  • Thanks for the suggestion, I have applied changes as you have described but still I am not seeing any metrics pushed to cloudwatch. So I had check for the logs in /opt/vespa/logs/vespa/vespa.log and found this error message: – Yash Kasat May 09 '20 at 16:21
  • com.yahoo.container.di.componentgraph.core.ComponentNode$ComponentConstructorException: Error constructing 'ai.vespa.metricsproxy.telegraf.Telegraf': null\nCaused by: java.io.UncheckedIOException: java.io.FileNotFoundException: /etc/telegraf/telegraf.conf (No such file or directory) – Yash Kasat May 09 '20 at 16:27
  • Also when I tried to use default metric set I was getting this error when deploying the code: Invalid application package: default.default: Error loading model: No such metric-set: default – Yash Kasat May 09 '20 at 16:35
  • @YashKasat, I suspect you need a newer version of Vespa. Which version are you currently using? – gjoranv May 09 '20 at 19:12
  • The version of vespa that I am using is: 7.145.41 – Yash Kasat May 10 '20 at 09:21
  • @YashKasat This feature was not complete until 7.191. As always, I suggest to use the latest version available. – gjoranv May 11 '20 at 08:17
  • I have updated the vespa version to 7.216.10, but still I am getting the error. However the location of search is changed to /opt/vespa/conf/telegraf/telegraf.conf. For reference, this is the property that I have set in the pom file. UTF-8 true 7.216.10 4.11 – Yash Kasat May 15 '20 at 14:07
  • Also I am seeing this error: slobrok vespa-slobrok.rpcserver info managed server vespa/service/admin/metrics/ at tcp/:19094 failed: (RPC) Connection error – Yash Kasat May 15 '20 at 14:17
  • This was a bug from our side. Sorry for the inconvenience! It will be fixed in the next release. I'll update this thread with the exact version number as soon as I know. – gjoranv May 18 '20 at 14:47
  • Vespa 7.225.71 has been released, containing the bugfix. – gjoranv May 26 '20 at 08:19
  • Thanks for the update, But I am still getting error logs. What other changes do I need to do to resolve the issue. For reference, Updated pom.xml property file UTF-8 true 7.225.71 4.11 – Yash Kasat May 31 '20 at 11:40
  • Does the log still display the same error message regarding the telegraf.conf file? – gjoranv May 31 '20 at 13:26
  • We are unfortunately unable to reproduce the error. Before we do a deeper analysis, can you please run the following command to verify that all relevant packages have the correct version: `rpm -qa | grep vespa` – gjoranv Jun 02 '20 at 14:42