I would like to run a Splunk query over a long period of time (e.g., months or years), but I am searching enough data that I am only able to search over hours or days of data.
However, for the question I want to answer in Splunk, I would be satisfied with a uniform or statistically unbiased sample of data. In other words, I would prefer the query return N events spread out over the past month, than any N consecutive events.
One way I considered was to only search events with date_minute=0
so as to quickly filter 1/60th of the events, which helps but is not very flexible.
Is there a better way to sample events efficiently in Splunk?