2

I have a index named Events

It contains a bunch of different events, all events have a property called EventName. Now I want to do a query where I return everything that matches the following:

IF AccountId exists in event with EventName AccountCreated AND there is at least 1 event with EventName FavoriteCreated with the same AccountId -> return all events where EventName == AccountCreated

Example events:

AccountCreated

{
   "AccountId": 1234,
   "EventName": "AccountCreated",
   "SomeOtherProperty": "Some value",
   "Brand": "My Brand",
   "DeviceType": "Mobile",
   "EventTime": "2020-06-01T12:13:14Z"
}

FavoriteCreated

{
   "AccountId": 1234,
   "EventName": "FavoritesCreated,
   "Brand": "My Brand",
   "DeviceType": "Mobile",
   "EventTime": "2020-06-01T12:13:14Z"
}

Given the following two events, I would like to create 1 query that returns the AccountCreated event.

I've tried the following but it does not work, surely I must be missing something simple?

index=events EventName=AccountCreated 
  [search index=events EventName=FavoriteCreated | dedup AccountId | fields AccountId]
| table AccountId, SomeOtherProperty

Im expecting ~6000 hits here but Im only getting 2298 events. What am I missing?

UPDATE Based on the answer given by @warren below, the following query works. The only problem is that it's using a JOIN which limits us to 50K results from the subsearch. When running this query I get 5900 results in total = Correct.

index=events EventName=AccountCreated AccountId=*
| stats count by AccountId, EventName
| fields - count
| join AccountId 
    [ | search index=events EventName=FavoriteCreated AccountId=*
    | stats count by AccountId ]
| fields - count
| table AccountId, EventName

I then tried to use his updated example like this but the problem seems to be that it returns FavoriteCreated events instead of AccountCreated. When running this query I get 25 494 hits = Incorrect.

index=events AccountId=* (EventName=AccountCreated OR EventName=FavoriteCreated)
| stats values(EventName) as EventName by AccountId
| eval EventName=mvindex(EventName,-1)
| search EventName="FavoriteCreated"
| table AccountId, EventName

Update 2 - WORKING @warren is awesome, here is a full working query that only returns data from the AccountCreated events IF 1 or more FavoriteCreated event exists.

index=events AccountId=* (EventName=AccountCreated OR EventName=FavoriteCreated)
| stats 
    values(Brand) as Brand,
    values(DeviceType) as DeviceType,
    values(Email) as Email,
    values(EventName) as EventName
    values(EventTime) as EventTime,
    values(Locale) as Locale,
    values(ClientIp) as ClientIp
  by AccountId
| where mvcount(EventName)>1
| eval EventName=mvindex(EventName,0)
| eval EventTime=mvindex(EventTime,0)
| eval ClientIp=mvindex(ClientIp,0)
| eval DeviceType=mvindex(DeviceType,0)
JOSEFtw
  • 9,781
  • 9
  • 49
  • 67

2 Answers2

1

You found, perhaps, one factor of your issues - that subsearches are capped at 50,000 (when doing a join) events (or 60 seconds run time (or 10,000 results when you use a "normal" subsearch)).

Start by dumping dedup in favor of stats:

index=events EventName=AccountCreated AccountId=*
| stats count by AccountId, SomeOtherProperty [, more, fields, as, desired]
| fields - count
| search 
    [ | search index=events EventName=FavoriteCreated AccountId=*
    | stats count by AccountId 
    | fields - count]
<rest of search>

If that doesn't get you where you want to be (ie, you still have too many results in your subsearch), you can try join:

index=events EventName=AccountCreated AccountId=*
| stats count by AccountId, SomeOtherProperty [, more, fields, as, desired]
| fields - count
| join AccountId 
    [ | search index=events EventName=FavoriteCreated AccountId=*
    | stats count by AccountId ]
| fields - count
<rest of search>

There are yet more ways of doing what you're looking for - but these two should get you a long ways toward your goal

Here's a join-less approach which'll show only "FavoriteCreated" events:

index=events AccountId=* (EventName=AccountCreated OR EventName=FavoriteCreated)
| stats values(EventName) as EventName values(SomeOtherProperty) as SomeOtherProperty by AccountId
| eval EventName=mvindex(EventName,-1)
| search EventName="FavoriteCreated"

And here's one that shows "FavoriteCreated" only if there was also an "AccountCreated" event in the same timeframe:

index=events AccountId=* (EventName=AccountCreated OR EventName=FavoriteCreated)
| stats values(EventName) as EventName values(SomeOtherProperty) as SomeOtherProperty by AccountId
| where mvcount(EventName)>1

And if you want to 'pretend' the values() didn't happen (ie, throw-out the "favoriteCreated" entry), add this:

| eval EventName=mvindex(EventName,0)
warren
  • 32,620
  • 21
  • 85
  • 124
  • Thank you so much, will try this out first thing in the morning!! :) – JOSEFtw May 28 '20 at 15:16
  • 1
    I tried it now, the first one "did not work" (same problem as before with truncated result). However, the second one worked like a charm!! Thank you! – JOSEFtw May 28 '20 at 15:25
  • When using the join, am I correct in my assumption that I've only raised the limit of records returned by the subsearch from 10000 to 50000? – JOSEFtw May 28 '20 at 19:44
  • 1
    @JOSEFtw - yes (note docs.splunk links provided). With a better picture of your use case, we may be able to give you an even better (and `join`-less) answer, too :) – warren May 28 '20 at 19:53
  • Yeah sorry, I missed them! So I can give you my full use case here. Let's say that I have 10 million FavoriteCreated events and 10 million AccountCreated events. I need to be able to select all AccountCreated events where the user has at least 1 FavoriteCreated event during a specific time period (let's say past six months just to give you an idea of the data size). Do you need anymore details? – JOSEFtw May 28 '20 at 19:56
  • Basically we use that information to decide if a user has been active or not during the last 6 months. And since the data returned from the subsearch can be a lot bigger than 50K I think you understand my problem? A join will not work Im afraid... – JOSEFtw May 28 '20 at 19:57
  • 1
    @JOSEFtw - try the additional search option I've provided – warren May 28 '20 at 19:59
  • 1
    thank you so much for your time and help. I've updated my question with examples on the queries I ran after your suggestions, the last one without the JOIN does not seem to work, it seems like it returns FavoriteCreated events instead of AccountCreated. Maybe it's something really small I need to change? This is my first day working with Splunk so please bare with me... :) – JOSEFtw May 28 '20 at 20:12
  • 1
    @JOSEFtw - the last one is *supposed* to only return "FavoriteCreated" events: that's what you said you wanted to find :) ...I've added another option below that for you, too – warren May 28 '20 at 20:22
  • Ah sorry, maybe im bad at expressing myself. Basically: I want to get ALL AccountCreated events WHERE at least 1 FavoriteCreated event exists with the same AccountId as the AccountCreated event! :) I want the exact same result as the JOIN query you posted, but without the join! – JOSEFtw May 28 '20 at 20:24
  • @JOSEFtw - did you try the `join`less search at the end? – warren May 29 '20 at 05:01
  • Yes I did! The problem with that one is that it returns FavoriteCreated events. I want it to return AccountCreated events :) IF AccountCreated event exists AND at least 1 FavoriteCreated event exists -> return said AccountCreated event :) – JOSEFtw May 29 '20 at 05:26
  • @JOSEFtw - then it *sounds* like you *didn't* try the `join`-less search listed at the end: the one that only returns events that have **both** FavoriteCreated ***and*** AccountCreated events :) – warren May 29 '20 at 17:37
  • I DID run it! And it works...BUT ( :) )...it's combining the values, so instead of getting just the AccountCreated event data I get a combined row of both the AccountCreated and FavoriteCreated event. Do you know if I can filter that somehow to just contain the data from AccountCreated? Thank you so much for your time/help! :) – JOSEFtw May 30 '20 at 09:38
  • @JOSEFtw - it's only combining the values of one field ... just add this line after the `|where`: `| eval EventName=mvindex(EventName,0)` – warren May 31 '20 at 17:53
  • Thank you SO MUCH! I updated my question with a working query at the bottom! You see the filtering I do at the end, is it possible to do that for "all fields"? Something like ```| eval *=mvindex(*,0)```? Or do I always need to specify eacg field explicitly? – JOSEFtw Jun 01 '20 at 06:38
  • @JOSEFtw - why would you filter on anything ***but*** `EventName`? The only field which will "have FavoriteCreated" data in it is `EventName` – warren Jun 01 '20 at 14:52
  • no that's not correct. For example the field EventTime exists in both AccountCreated and FavoriteCreated. If I have 1 AccountCreated and 4 FavoriteCreated for 1 specific AccountId, EventTime will contain 5 values. When I then add eval EventTime=mvindex(EventTime,0), I only get 1 value (the one from AccountCreated) – JOSEFtw Jun 01 '20 at 17:40
  • @JOSEFtw - that's additional data you hadn't shared earlier :) – warren Jun 01 '20 at 18:00
  • 1
    Yeah sorry I updated my question when I discovered it. But thank you so much for your help, it's working now! :) – JOSEFtw Jun 01 '20 at 18:26
0

Turns out that Splunk truncates the SubSearch result if its bigger than 10 000 results... Splunk search feedback enter image description here

JOSEFtw
  • 9,781
  • 9
  • 49
  • 67