2

I'm working with a simplified example in which there are workers which can have multiple lifecycles in which they perform tasks. (This is similar to the example of users logging into different sessions and performing shell commands given in https://community.splunk.com/t5/Splunk-Search/Any-example-for-MAP-command/m-p/88473).

When a task is started, a taskID and lifecycleID is logged. However, I would also like to look up the corresponding workerID which would have been logged together with the lifecycleID in a previous log line when the lifecycle started.

Consider the following example data:

{
  "level": "info",
  "lifecycleID": "af331787-654f-441f-ac06-21b6b7e0c984",
  "msg": "Started lifecycle",
  "time": "2022-04-02T21:15:38.07991-07:00",
  "workerID": "c51df20b-f157-4002-8292-4583ebd3ba9e"
}
{
  "level": "info",
  "lifecycleID": "af331787-654f-441f-ac06-21b6b7e0c984",
  "msg": "Started task",
  "taskID": "9de93d09-5e6e-4648-9488-dda0e3e58765",
  "time": "2022-04-02T21:15:38.181107-07:00"
}
{
  "level": "info",
  "lifecycleID": "03d2148c-b697-4d8e-a3ca-f0fb68d2bbb9",
  "msg": "Started lifecycle",
  "time": "2022-04-02T21:15:38.282264-07:00",
  "workerID": "c51df20b-f157-4002-8292-4583ebd3ba9e"
}
{
  "level": "info",
  "lifecycleID": "03d2148c-b697-4d8e-a3ca-f0fb68d2bbb9",
  "msg": "Started task",
  "taskID": "243bf757-85c6-4c6e-9eec-6d74886ec407",
  "time": "2022-04-02T21:15:38.383176-07:00"
}
{
  "level": "info",
  "lifecycleID": "9cab44b4-5600-47b3-9acd-47b2641cb0d5",
  "msg": "Started lifecycle",
  "time": "2022-04-02T21:15:38.483304-07:00",
  "workerID": "0b82966c-cc98-48f0-9a36-a699e2cee48c"
}
{
  "level": "info",
  "lifecycleID": "9cab44b4-5600-47b3-9acd-47b2641cb0d5",
  "msg": "Started task",
  "taskID": "864819ed-208d-4d3d-96b9-1af4c4c42b08",
  "time": "2022-04-02T21:15:38.584478-07:00"
}
{
  "level": "info",
  "lifecycleID": "9cab44b4-5600-47b3-9acd-47b2641cb0d5",
  "msg": "Finished task",
  "taskID": "864819ed-208d-4d3d-96b9-1af4c4c42b08",
  "time": "2022-04-02T21:15:38.684633-07:00"
}

I would like to generate a table which shows the workerID, lifecycleID, and taskID for each of the three tasks started. So far what I've come up with is

index="workers" msg="Started task" 
| stats count by lifecycleID 
| map search="search index=workers msg=\"Started lifecycle\" lifecycleID=$lifecycleID$" 
| table workerID, lifecyleID, taskID

However, this doesn't appear to retain the lifecycleID and taskID (like it would if I were to omit the map and simply count by lifecycleID, taskID):

enter image description here

How can I make it such that I can display all three values in the table?

Update

I've attempted RichG's answer using a subsearch,

index=workers msg="Started lifecycle" 
[ search index="workers" msg="Started task" 
  | stats count by lifecycleID
  | fields lifecycleID
  | format ]
| table workerID, lifecyleID, taskID

but it generates output that is identical to the one generated in my own attempt using a map, i.e. without the lifecycleID or taskID:

enter image description here

Kurt Peek
  • 52,165
  • 91
  • 301
  • 526

2 Answers2

1

Try using a subsearch instead of map. In the subsearch below (the part inside square brackets), a list of unique lifecycleID values is produced and formatted into (lifecycleID="foo" OR lifecycleID="bar"). That string is substituted for the subsearch to produce a search for all "Started lifecycle" events with one of the specified lifecycleID's.

index=workers msg="Started lifecycle" 
[ search index="workers" msg="Started task" 
  | stats count by lifecycleID
  | fields lifecycleID
  | format ]
| table workerID, lifecyleID, taskID

Another method for combining events is the stats command. See the run-anywhere example below.

| makeresults 
| eval data="{\"level\": \"info\",\"lifecycleID\": \"af331787-654f-441f-ac06-21b6b7e0c984\",\"msg\": \"Started lifecycle\",\"time\": \"2022-04-02T21:15:38.07991-07:00\",\"workerID\": \"c51df20b-f157-4002-8292-4583ebd3ba9e\"}
{\"level\": \"info\",\"lifecycleID\": \"af331787-654f-441f-ac06-21b6b7e0c984\",\"msg\": \"Started task\",\"taskID\": \"9de93d09-5e6e-4648-9488-dda0e3e58765\",\"time\": \"2022-04-02T21:15:38.181107-07:00\"}
{\"level\": \"info\",\"lifecycleID\": \"03d2148c-b697-4d8e-a3ca-f0fb68d2bbb9\",\"msg\": \"Started lifecycle\",\"time\": \"2022-04-02T21:15:38.282264-07:00\",\"workerID\": \"c51df20b-f157-4002-8292-4583ebd3ba9e\"}
{\"level\": \"info\",\"lifecycleID\": \"03d2148c-b697-4d8e-a3ca-f0fb68d2bbb9\",\"msg\": \"Started task\",\"taskID\": \"243bf757-85c6-4c6e-9eec-6d74886ec407\",\"time\": \"2022-04-02T21:15:38.383176-07:00\"}
{\"level\": \"info\",\"lifecycleID\": \"9cab44b4-5600-47b3-9acd-47b2641cb0d5\",\"msg\": \"Started lifecycle\",\"time\": \"2022-04-02T21:15:38.483304-07:00\",\"workerID\": \"0b82966c-cc98-48f0-9a36-a699e2cee48c\"}
{\"level\": \"info\",\"lifecycleID\": \"9cab44b4-5600-47b3-9acd-47b2641cb0d5\",\"msg\": \"Started task\",\"taskID\": \"864819ed-208d-4d3d-96b9-1af4c4c42b08\",\"time\": \"2022-04-02T21:15:38.584478-07:00\"}
{\"level\": \"info\",\"lifecycleID\": \"9cab44b4-5600-47b3-9acd-47b2641cb0d5\",\"msg\": \"Finished task\",\"taskID\": \"864819ed-208d-4d3d-96b9-1af4c4c42b08\",\"time\": \"2022-04-02T21:15:38.684633-07:00\"}" 
| eval data=split(data,"
") 
| mvexpand data 
| eval _raw=data 
| extract 
```Everything above is just to set up test data.  Omit IRL```
```Combine events that share the same taskID```
| stats values(*) as * by lifecycleID 
| table workerID, lifecycleID, taskID
RichG
  • 9,063
  • 2
  • 18
  • 29
  • I've tried that (see update to the question above) and in the results, the `lifecycleID` and `taskID` columns are empty, just like in the example using `map` from my original question. – Kurt Peek Apr 07 '22 at 15:17
  • See my updated answer. – RichG Apr 07 '22 at 16:37
  • As I understand it, your updated answer is `index=workers | stats values(*) as * by lifecycleID | table workerID, lifecycleID, taskID`. I'm wondering whether this would generalize to a situation in which there are also other logs containing `workerID`, `lifecycleID`, and `taskID` fields, as this solution no longer uses the `msg` value (either `Started lifeycle` or `Started task`) to filter the results. Can you explain in a bit more detail how this solutions works? – Kurt Peek Apr 11 '22 at 13:10
  • The `stats` command just merges results. Any filtering would have to be done before that point as in `index=workers (msg="Started lifecycle" OR msg="Started task") | stats values(*) as * by lifecycleID | table workerID, lifecycleID, taskID` – RichG Apr 11 '22 at 14:27
1

I realized that this could be achieved by a join query:

index=workers msg="Started lifecycle" 
| join lifecycleID 
    [ search index=workers msg="Started task"] 
| table workerID, lifecycleID, taskID

The results are shown below.

enter image description here

Kurt Peek
  • 52,165
  • 91
  • 301
  • 526