1

I have two searches, both of which use the exact same dataset, but one uses bucket or bin command to bin into time groups and find the maximum requests in any second; the other counts the total requests, errors, etc.

The first search is something like:

sourcetype="sourcetype" 
| stats count as requests, count(eval(http_code >= 400)) as errors by app_name

The second search is something like:

sourcetype="sourcetype" 
| bucket span="1s" _time 
| stats count by _time, app_name 
| stats max(count) as max_tps by app_name

My first idea was to use appendpipe like this but it was very slow.

sourcetype="sourcetype" 
| appendpipe [bucket span="1s" _time | stats count by _time, app_name | stats max(count) as max_tps by app_name] 
| stats count as requests, count(eval(http_code >= 400)) as errors, max(max_tps) by app_name

This worked as expected but it is extremely slow, like 5x slower than doing both searches separately one after the other, when I would expect it to be quicker since it is only retrieving the data once.

I also tried appendcols but there is difficulty matching up the data correctly. Alternatively, I could use a join.

However, apparently it is best to avoid subsearches completely and just use stats, but how can I do this when the bucket command changes the current dataset?

It seems like it should be possible to avoid subsearches since the primary dataset is exactly the same.

PM 77-1
  • 12,933
  • 21
  • 68
  • 111
JosephCHW
  • 21
  • 3
  • What do you want the final result to look like? – RichG Oct 24 '22 at 14:14
  • @RichG hi, I would like the final result to be rows with app_name, requests, errors, max_tps all at once. If I do each search individually, I get app_name with total requests and total errors in the first search, and I get app_name and max_tps in the second search, but I want them all at once, since the source data is the same. I have achieved this using join with a subsearch, but I was wondering if it could be made more efficient and reliable by avoiding a subsearch. Thanks. – JosephCHW Oct 24 '22 at 15:24
  • `_time` is *already* binned into one-second increments. Did you mean to bin it to 1 *minute*? – warren Oct 24 '22 at 19:51

1 Answers1

0

I think a subsearch may be unavoidable. While both queries start with the same dataset, they quickly diverge into separate transformations so it's hard to share any code.

appendcols won't work in this case for the reason you discovered and because it's rarely the answer to a Splunk problem.

appendpipe is harder to explain, but suffice it to say that it has limited application (and this isn't one of them).

This is the best I could do. It's better than a join, but still uses a subsearch.

index=foo ```Always specify an index``` sourcetype="sourcetype" 
| stats count as requests, count(eval(http_code >= 400)) as errors by app_name
| append [ index=foo sourcetype="sourcetype" 
  | bucket span="1s" _time 
  | stats count by _time, app_name 
  | stats max(count) as max_tps by app_name
]
| stats values(*) as * by app_name
| table app_name requests errors max_tps
RichG
  • 9,063
  • 2
  • 18
  • 29