0

I am trying to get total uptime of a single GCP compute vm instance inclusive of restarts. I've seen multiple posts not one with using MQL.

Eg: In the past 24 hours if instance is not running for 1hr , i expect the mql query to return 23 hrs

In the below snap, code snippet the graph reqpresents the max uptime but doesn't consider the restarts . I've tried using secondary aggregator with max but still query doesn't report the exact value.

If you have any idea on how to get information of total uptime in the past 1 day through MQL that would be very helpful. Any pointers are much appreciated. Thank you.

fetch gce_instance
| metric 'compute.googleapis.com/instance/uptime_total'
| group_by 1d, [value_uptime_total_max: max(value.uptime_total)]
| every 1d

enter image description here

kdima
  • 53
  • 4
Mozhi
  • 757
  • 1
  • 11
  • 28

3 Answers3

1

you can try with the uptime metric instead :

fetch gce_instance
| metric 'compute.googleapis.com/instance/uptime'
| filter (metric.instance_name == 'instance-1')
| align delta(1d)
| every 1d
| group_by [], [value_uptime_mean: mean(value.uptime)]

so you get a graph similar to this one:

enter image description here

Hi_Esc
  • 158
  • 10
  • A quick question ~ isn't mean(value.uptime) giving mean value and rather not exact uptime interval in a given day. From my observations uptime is getting resetted on restarts. Can you please let me know does your query considers this case ? – Mozhi Apr 15 '21 at 05:00
  • 1
    I've done some tests, and you can even remove the aggregator and it seems to work. – Hi_Esc Apr 16 '21 at 23:44
0

GCP compute VM metrics instace/uptime , instance/uptime_total are not reliable. Rather tracking uptime through uptime check and using following MQL query gives the exact values for historical uptime.

Please replace 30d with appropriate value 1d , 1h

fetch uptime_url
| metric 'monitoring.googleapis.com/uptime_check/check_passed'
| filter (metric.check_id == 'dev-uptime-test')
| group_by 30d,
    [value_check_passed_fraction_true: fraction_true(value.check_passed)]
| every 30d | mean 
Mozhi
  • 757
  • 1
  • 11
  • 28
0

Using sliding in the group_by and sum aggregator for the calculation.

fetch gce_instance
| metric 'compute.googleapis.com/instance/uptime_total'
| filter (metric.instance_name = "the instance name you need")
| group_by [], sliding(1d), [value_uptime_total_sum: sum(value.uptime_total)]
Cheng Hou
  • 81
  • 1