StackDriver Custom Metric for Nodejs event loop latency

Question

I am trying to construct a custom metric for Google StackDriver that I can use to track nodejs event loop latencies. All the apps are running in Google AppEngine so I am confined to using the monitored resource global (at least to my understanding).

Via the nodejs @google/monitoring client I have created a metric descriptor looking like this:

{
  name: client.projectPath(projectId),
  metricDescriptor: {
    description: 'Nodejs event loop latency',
    displayName: 'Event Loop Latency',
    type: 'custom.googleapis.com/nodejs/eventloop/latency',
    metricKind: 'GAUGE',
    valueType: 'DOUBLE',
    unit: '{ms}',
    labels: [
      {
        key: 'instance_id',
        valueType: 'STRING',
        description: 'The ID of the instance reporting latency (containerId, vmId, etc.)',
      },
    ],
},

And writing data to this custom metric like:

metric: {
    type: 'custom.googleapis.com/nodejs/eventloop/latency',
    labels: {
      instance_id: instanceId,
    },
  },
  resource: {
    type: 'global',
    labels: {
      project_id: projectId,
    },
  },
  points: [{
    interval: {
      endTime: {
        seconds: item.at,
      },
    },
    value: {
      doubleValue: item.value,
    },
  }],
};

I thought all was good while writing my tests, until I tried changing my instance_id to write data that was in an overlapping timespan as another fake instance had already written. Now the monitor client throws the error

Error: One or more TimeSeries could not be written: Points must be written in order. One or more of the points specified was older than the most recent stored point.

Which renders my custom metric VERY useless, only one nodejs process can ever write to this custom metric.

Now my question is, how can I circumvent this? I want to be able to write from all of my nodejs instances running (x AppEngine services with y instances running).

I was thinking a type that is indexed on nodejs/eventloop/latency/{serviceName}/{serviceVersion}/{instanceId} but it seems a bit extreme and will quickly bring me towards the quotas on the StackDriver account.

Any suggestions are highly appreciated!

Summit Raj · Accepted Answer · 2018-02-04T17:49:39.973

0

Time series data for custom metrics in Stackdriver must be written time in-order as documented in https://cloud.google.com/monitoring/custom-metrics/creating-metrics#which-resource.

A workaround for this is to create a separate time series for every instance writing to the metric by adding a user-defined label for the instance_id. You can also add separate labels for service_name or service_version, if you require it. However, be mindful of the cardinality of the label values. Creating too many timeseries on a single metric can degrade query performance.

More details on what a time series is: See https://cloud.google.com/monitoring/api/v3/metrics-details#intro-time-series.

edited Feb 04 '18 at 17:49

answered Feb 03 '18 at 22:09

Summit Raj

820
5
10

Thanks Summit - I was hoping though that the custom metrics had a time-series for each label I created, as this would have solved my problem. I guess I will have to go for creating unique `type`s to track metrics from all the running instances. – nover Feb 04 '18 at 17:35
No, you do not need to create a unique metric `type` for each instance. My recommendation was to use a single metric, with `instance_id` as a label. Each unique combination of label values creates a times series. I have clarified this a bit in my answer. – Summit Raj Feb 04 '18 at 17:40

StackDriver Custom Metric for Nodejs event loop latency

1 Answers1