1

Say I have a query such as

index="example" source="example.log" host="example" "ERROR 1234" 
| stats distinct_count by id

This will give me all the events with that error code per id.

I then want to combine this query to search the same log file for another string but only on the unique id's returned from the first search. Because the new string will appear on a separate event I can't just do an 'AND'.

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
hotspuds
  • 47
  • 1
  • 7
  • What should the results look like? – RichG Jan 06 '23 at 13:43
  • The output should be a flat count of the occurrences where an id had string1 in one event and string2 in another event – hotspuds Jan 06 '23 at 14:04
  • `stats count by` is *much* faster than `stats distinct_count by`: especially since you're only getting a *number* as a result (unless you really really need to know just the unique instances of the `by` clause – warren Jan 09 '23 at 18:15

3 Answers3

1

There are a few ways to do that, including using subsearches, join, or append, but those require multiple passes through the data. Here is a way that makes a single pass through the index.

index=example source="example.log" ("ERROR 1234" OR "ERROR 5678")
``` Check for the presence of each string in the event ```
| eval string1=if(searchmatch("ERROR 1234"), 1, 0)
| eval string2=if(searchmatch("ERROR 5678"), 1, 0)
``` Count string occurrences by id ```
| stats sum(string1) as string1, sum(string2) as string2 by id
``` Keep only the ids that have both strings ```
| where (string1 > 0 AND string2 > 0)
RichG
  • 9,063
  • 2
  • 18
  • 29
  • 1
    for others' edification, `| stats sum(*) as * by ...` is a much simpler (and more flexible) approach when you have a potentially-unknown-size list of things you want to `stats` across :) – warren Jan 09 '23 at 18:13
0

You can search for "some other string" in subsearch and then join the queries on the id:

index="example" source="example.log" host="example" "ERROR 1234"  
| join id [search index="example" source="example.log" host="example" "some other string" ]
| stats distinct_count by id
Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
0

Presuming your id field is the same and available in both indices, this form should work:

(index=ndxA sourcetype=srctpA id=* source=example.log host=example "ERROR 1234") OR (index=ndxB sourcetype=srctpB id=* "some other string")
| rex field=_raw "(?<first_field>ERROR 1234)"
| rex field=_raw "(?<second_field>some other string)"
| fillnull value="-" first_field second_field
| stats count by id first_string second_string
| search NOT (first_string="-" OR second_string="-")

If your id field has a different name in the other index, do a rename like this before the stats line:

| rename otherIdFieldName as id

Advantages of this format:

  • you are not limited by subsearch constraints (search must finish in 60 seconds, no more than 50k rows)
  • the Search Peers (ie Indexers) will handle all of the overhead instead of having to wait on the Search Head that initiated the search to do lots of post-processing (all the SH is doing is sending the distributed search, then a post-stats filter to ensure both first_string and second_string have the values you are looking for)
warren
  • 32,620
  • 21
  • 85
  • 124