2

I have a service that accepts and processes tasks. A Task has a status: queued, running, failed, cancelled or finished. Once in a while the service spits out a log entry with the json, like this:

2021-09-09 00:30:46,742 [Timer-0] INFO - { "env": "test_environment", "capacity": 10, "available_ec2": 10, "failed_ec2": 0, "running_tasks": 0, "queued_tasks": 0, "finished_tasks": 0, "failed_tasks": 0, "cancelled_tasks": 3,"queue_wait_minutes" : { "max": 0, "mean": -318990, "max_started": 0, "mean_started": -29715 },"processing_time": {"max": 0, "mean": 0} }

I would like to plot a pie chart that would show the breakdown of the tasks by status ("running_tasks", "queued_tasks", "finished_tasks", "failed_tasks":, "cancelled_tasks" in the json message). So far I have failed to do so, because I cannot come up with how to construct a table out of such message. Any clues would be highly appreciated — thanks in advance!

Denethor
  • 33
  • 5

3 Answers3

1

Try something like this. Basically, you have to de-transpose the data. I hope this makes sense!

...
| parse field=some_log_line "INFO - *" as jsonMessage
| json field=jsonMessage "running_tasks"
| json field=jsonMessage "queued_tasks"
| json field=jsonMessage "finished_tasks"
| "running_tasks,queued_tasks,finished_tasks," as message_keys
| parse regex field=message_keys "(?<message_key>.*?)," multi
| if (message_key="running_tasks", running_tasks, 0) as message_value
| if (message_key="queued_tasks", queued_tasks, message_value) as message_value
| if (message_key="finished_tasks", finished_tasks, message_value) as message_value
| fields message_key, message_value
| max(message_value) by message_key
the-nick-wilson
  • 566
  • 4
  • 18
  • That seems to do the trick, thank you! However, I have some questions: 1. It seems to work even if I remove the 3 "json field=..." clauses (lines 3-5) 2. Why there is a "0" in the "if (message_key="running_tasks", running_tasks, 0)"? 3. If I run the query as is, running_tasks shows "1". If I add another field, "cancelled_tasks", the running_tasks shows 0. So far I couldn't figure out why... any clues? – Denethor Sep 14 '21 at 09:06
  • Sure thing! 1: I think those lines were just artifacts from me trying to test in my own env. I think you can safely ditch those. 2: 0 is the initial default, since a count will always start at 0. 3: I think I'd have to see the query after you added that extra field. I'm wondering if you didn't update something somewhere in a copy/paste? – the-nick-wilson Sep 14 '21 at 15:50
  • here's what I came up with: `| "running_tasks,queued_tasks,finished_tasks,cancelled_tasks," as message_keys | parse regex field=message_keys "(?.*?)," multi | if (message_key="running_tasks", running_tasks, 0) as message_value | if (message_key="queued_tasks", queued_tasks, message_value) as message_value | if (message_key="finished_tasks", finished_tasks, message_value) as message_value | if (message_key="cancelled_tasks", cancelled_tasks, message_value) as message_value | fields message_key, message_value | max(message_value) by message_key` – Denethor Sep 15 '21 at 09:23
  • Hmm, that looks right to me. Hard to tell without being able to see your log lines. The only thing I can think of is that the timerange you ran it for with that new field added happened to encounter a later log line that showed running_tasks: 0 (or there was no running_tasks key). Make sure you're running it for an absolute timerange while you're troubleshooting so you know you're looking at constant data. – the-nick-wilson Sep 16 '21 at 21:47
0

First of all, Sumo Logic supports parsing JSON into fields. In your example not the whole line is a JSON, but only the part after "-", so you can add this to your query:

...
| parse "INFO - *" as jsonMessage
| json auto

Then, you can use running_tasks, queued_tasks, etc. as ordinary fields, e.g.

...
| timeslice 1m
| max(running_tasks), max(queued_tasks) by _timeslice

Disclaimer: I am currently employed by Sumo Logic.

Grzegorz Oledzki
  • 23,614
  • 16
  • 68
  • 106
  • Thanks for the suggestion! However, when I try this I get the following warning: ``` The data from your query cannot be plotted in the panel type selected. Please change the panel type. ``` and the pie chart shows only the first value ("running_tasks") I suspect that for it to plot the values, the "running_tasks" etc should be _rows_, not _columns_ in the resultant table, but I'm stuck at how to make it. P.S.: Line chart over the time series works OK, but I'd like to show the _current status_ as a pie chart, not over the time. – Denethor Sep 13 '21 at 09:57
  • @Denethor can you paste your query here by any chance? It sounds like you have too many aggregates for a pie chart. You can only have one, I believe (unless I'm not following you). – the-nick-wilson Sep 13 '21 at 15:19
  • Sure, it's like this: ``` _sourcecategory = "Map-ConflationService" AND _sourcehost = "173.225.29.126" | parse "INFO - *" as jsonMessage | json auto | max(running_tasks) as r, max(cancelled_tasks) as c, max(queued_tasks) as q by _timeslice ``` Beats me how to squeeze these 3 numbers into the pie.. – Denethor Sep 13 '21 at 15:52
  • Ooooh I see now. I'll post an answer that maybe will help... – the-nick-wilson Sep 13 '21 at 22:32
0

Below is a pure python solution that will you plot the data.

The output (entries) is a dict where the key is the time stamp and the value is a dict that contains the interesting info. log_lines holds a collection of log messages and is used as the input.

import json
import pprint

log_lines = [
    '2021-09-09 00:30:46,742 [Timer-0] INFO - { "env": "test_environment", "capacity": 10, "available_ec2": 10, "failed_ec2": 0, "running_tasks": 2, "queued_tasks": 0, "finished_tasks": 0, "failed_tasks": 0, "cancelled_tasks": 3,"queue_wait_minutes" : { "max": 0, "mean": -318990, "max_started": 0, "mean_started": -29715 },"processing_time": {"max": 0, "mean": 0} }',
    '2021-09-09 00:31:46,742 [Timer-0] INFO - { "env": "test_environment", "capacity": 10, "available_ec2": 10, "failed_ec2": 0, "running_tasks": 5, "queued_tasks": 0, "finished_tasks": 0, "failed_tasks": 0, "cancelled_tasks": 3,"queue_wait_minutes" : { "max": 0, "mean": -318990, "max_started": 0, "mean_started": -29715 },"processing_time": {"max": 0, "mean": 0} }'
]
entries = dict()

for line in log_lines:
    date = line[:line.find('[') - 1]
    data = json.loads(line[line.find('{'):])
    sub_set = {k: data.get(k,0) for k in
               ["running_tasks", "queued_tasks", "finished_tasks", "failed_tasks", "cancelled_tasks"]}
    entries[date] = sub_set
pprint.pprint(entries)

output

{'2021-09-09 00:30:46,742': {'cancelled_tasks': 3,
                             'failed_tasks': 0,
                             'finished_tasks': 0,
                             'queued_tasks': 0,
                             'running_tasks': 2},
 '2021-09-09 00:31:46,742': {'cancelled_tasks': 3,
                             'failed_tasks': 0,
                             'finished_tasks': 0,
                             'queued_tasks': 0,
                             'running_tasks': 5}}
balderman
  • 22,927
  • 7
  • 34
  • 52