1

I have two tables in Sentinel with data in them. I have field A in Table A that is a number. And I have two fields B and C in Table B which are also numbers but they represent a range. Like Table A field A contains the number '9', Table B field B contains the number '3' and Table B field C contains number '18'.

I would like to go through all of the entries in Table A and find the matching logs for these entries in Table B. An entry is matching if the value from field A is in the range of field B and C (between B and C). So, in the example above the number 9 is between 3 and 18, so these two entries from the two tables would match.

Because they are not exact matches I can't use join to find matching entries.

Is there a way to do this with KQL (Kusto) somehow? I tried multiple solutions but none of them worked out so far. I tried to use user-defined functions but I got an 'Tabular expression is not expected in the current context' error.

Erik Oppedijk
  • 3,496
  • 4
  • 31
  • 42

1 Answers1

3

A naive way would be to use Cartisian product and apply a filter, here is an example:

let A = datatable(a:int) [9, 25, 2];
let B = datatable(b:int, c:int) [3,13, 1,2];
A
| extend dummy = 1
| join kind=inner (B | extend dummy =1) on dummy
| where a between (b .. c)
| project-away dummy*
a b c
9 3 13
2 1 2
Avnera
  • 7,088
  • 9
  • 14
  • First I was afraid this is not going to work. One of my table has 4million entries. The other one depends on a lot of things but usually between 10k and 100k. So a normal cartesian product did not work. With having 300-400 entries in the second table the query finished in less than 10 minutes. But above 500 is started to fail (Sentinel times out after 10 mins). But I modified this a little bit. Instead of using the same dummy value for everything I created bags. Practically I rounded the value in Field A and Field B down to a round number. This way it was not an everything to everything – user7696812 Jan 09 '22 at 00:51
  • ... everything to everything match, but i only had matches for values which were close to each other. So at the end, I had way less entries so I could effectively process them. Instead of 500 in table B timing out I could process 15k events in table B in less than 10 seconds. Thanks for the idea. Definitely helped. And I doubt there is any other way to do it. – user7696812 Jan 09 '22 at 00:53
  • 1
    This is a very cool idea and definitely helps with performance, but it depends on your domain knowledge to ensure that the categorization of "a" and "b" to different groups is correct and that you do not miss valid cases. – Avnera Jan 09 '22 at 05:41