How to get uptime total and percentage of GCP compute vm instance through MQL?

Question

I am trying to get total uptime of a single GCP compute vm instance inclusive of restarts. I've seen multiple posts not one with using MQL.

Eg: In the past 24 hours if instance is not running for 1hr , i expect the mql query to return 23 hrs

In the below snap, code snippet the graph reqpresents the max uptime but doesn't consider the restarts . I've tried using secondary aggregator with max but still query doesn't report the exact value.

If you have any idea on how to get information of total uptime in the past 1 day through MQL that would be very helpful. Any pointers are much appreciated. Thank you.

fetch gce_instance
| metric 'compute.googleapis.com/instance/uptime_total'
| group_by 1d, [value_uptime_total_max: max(value.uptime_total)]
| every 1d

score 1 · Answer 1 · answered Apr 14 '21 at 19:27

1

you can try with the uptime metric instead :

fetch gce_instance
| metric 'compute.googleapis.com/instance/uptime'
| filter (metric.instance_name == 'instance-1')
| align delta(1d)
| every 1d
| group_by [], [value_uptime_mean: mean(value.uptime)]

so you get a graph similar to this one:

answered Apr 14 '21 at 19:27

Hi_Esc

158
10

A quick question ~ isn't mean(value.uptime) giving mean value and rather not exact uptime interval in a given day. From my observations uptime is getting resetted on restarts. Can you please let me know does your query considers this case ? – Mozhi Apr 15 '21 at 05:00
1

I've done some tests, and you can even remove the aggregator and it seems to work. – Hi_Esc Apr 16 '21 at 23:44

score 0 · Answer 2 · answered Apr 20 '21 at 11:32

GCP compute VM metrics instace/uptime , instance/uptime_total are not reliable. Rather tracking uptime through uptime check and using following MQL query gives the exact values for historical uptime.

Please replace 30d with appropriate value 1d , 1h

fetch uptime_url
| metric 'monitoring.googleapis.com/uptime_check/check_passed'
| filter (metric.check_id == 'dev-uptime-test')
| group_by 30d,
    [value_check_passed_fraction_true: fraction_true(value.check_passed)]
| every 30d | mean

score 0 · Accepted Answer · answered Mar 17 '22 at 19:56

Using sliding in the group_by and sum aggregator for the calculation.

fetch gce_instance
| metric 'compute.googleapis.com/instance/uptime_total'
| filter (metric.instance_name = "the instance name you need")
| group_by [], sliding(1d), [value_uptime_total_sum: sum(value.uptime_total)]

How to get uptime total and percentage of GCP compute vm instance through MQL?

3 Answers3

Linked