0

I have the following classes:


class Event(Model):
    ...

class IOCType(Model):
    name = CharField(max_length=50)

class IOCInfo(Model):
    event = ForeignKey(Event, on_delete=CASCADE, related_name="iocs"
    ioc_type = ForeignKey(IOCType, on_delete=CASCADE)
    value = CharField(max_lenght=50)

Each event has one or several IOCs associated with it, which are stored in the IOCInfo table.

This is how my IOCInfo table looks like after creating some events:

id value event_id ioc_type_id
1 some-value1 eventid1 4
2 some-value2 eventid1 8
3 some-value3 eventid1 8
4 some-value4 eventid1 1
5 some-value3 eventid2 8
6 some-value1 eventid2 1
7 some-value2 eventid3 8
8 some-value3 eventid4 8

What I want to do is to take an event, compare its IOCInfo with those of other events and get back those events that match.

This is what I have done so far and it works, but I fear that as the database grows and the app has more users this query will end up being a bottleneck


def search_matches(event):
    matches = Event.objects.none() 
    for ioc in event.iocs.all():
        matches |= Event.objects.filter(
            iocs__ioc_type=ioc.ioc_type, iocs__value=ioc.value
        )
    return matches.exclude(event=event.id)

If I pass the eventid2 to The above function, it will return eventid1 and eventid4 as expected.

Any suggestions to improve this function using any django method would be appreciated.

mprat
  • 2,451
  • 15
  • 33
wisvem
  • 5
  • 1
  • 3

3 Answers3

0

If I understand correctly:

  1. Filter iocs that correspond to given event:

    iocs = IOCInfo.objects.filter(event=event)
    
  2. Get events that has the same iocs (except the given one)

    events = Event.objects.filter(iocs__in=iocs).exclude(pk=event.pk)
    

That will result in a single pretty efficent query.

UPD:

To look into the fields of ioc - replace point 2 with the following:

events = Event.objects.filter(
    iocs__ioc__type__in=iocs.values_list('type', flat=True),
    iocs__ioc__value__in=iocs.values_list('value', flat=True)
).exclude(pk=event.pk)
Egor Wexler
  • 1,754
  • 6
  • 22
  • I tried your function but it always returns an empty queryset, and it's because your query is comparing the ID of the iocinfo. This is what i looks like the query of your function `SELECT "event"."id', ... FROM "event" INNER JOIN "iocinfo" ON ("event"."id" = "iocinfo"."event_id") WHERE ("iocinfo"."id" IN (SELECT U0."id" FROM "iocinfo" U0 WHERE U0."event_id" = 'eventid2') AND NOT ("event"."id" = 'eventid2'))` – wisvem Dec 16 '22 at 17:49
  • Hi, I had to make a little change to make your function work and the avg is around `0,004 seconds` for now is the best solution, the next one is around `0,010 seconds` and my original function is `0,019 seconds` – wisvem Dec 19 '22 at 15:59
  • 1
    I forgot to say thank you. Thank you very much! – wisvem Dec 23 '22 at 22:05
0

Let's see if I understand...

def search_matches(event):
    get_values = lambda data, key: [x[key] for x in data]
    data_to_filter = evetn.iocs.all().values('ioc_type', 'value').order_by('id')

    return Event.objects.filter(
        iocs__ioc_type__in=get_values(data_to_filter, ‘ioc_type’),
        iocs__value__in=get_values(data_to_filter, ‘value’)
    ).exclude(pk=event.id)
SoundWave
  • 91
  • 7
  • I tried your function but it always returns an empty queryset, and it's because your query is comparing the ID of the iocs. – wisvem Dec 16 '22 at 17:46
  • Fix: It is comparing IOCInfo ID, That field is unique so it will never return anything. – wisvem Dec 16 '22 at 17:56
  • got it @wisvem, I updated the reply, take a look at it and let me know – SoundWave Dec 17 '22 at 09:27
  • Now it is working. The avg is `0.010 seconds` . For now @egor-wexler solution is the best one with `0,004 seconds` avg. Anyway thank you very much – wisvem Dec 19 '22 at 16:03
  • Cool, I saw @egor-wexler’s reply and I learned something new! :) – SoundWave Dec 20 '22 at 00:55
0

first of all you should get the values of ioc_type and values for the IOCInfo of the eventid specified

event_filters=IOCInfo.objects.filter(event=event).values(iocs__ioc_type=F("ioc_type"),iocs__value=F("value"))

then you should construct a Q object to filter the events according to either condition of ioc_type and value satisfied finally,use this Q object to filter your event

filtering_kwargs=Q()
for filter_ in event_filters:
    filtering_kwargs|=Q(**filter_)
matching_events=Event.objects.filter(filtering_kwargs).exclude(id=event.id)

Hope that solves your problem as db will be queried only twice no matter how large your data is

combining everything in a single function becomes

from django.db.models import F
def search_matches(event):
    event_filters=IOCInfo.objects.filter(event=event).values(iocs__ioc_type=F("ioc_type"),iocs__value=F("value"))
    filtering_kwargs=Q()
    for filter_ in event_filters:
        filtering_kwargs|=Q(**filter_)
    matching_events=Event.objects.filter(filtering_kwargs).exclude(id=event.id)
    return matching_events
wisvem
  • 5
  • 1
  • 3
Dante
  • 165
  • 8
  • Hi, I tried your function but I had to change the kwargs to normal args to make it work but i got another error because the query is trying to filter `ioc_type` and `value` but those fields aren't part of the Event model. `matching_events=Event.objects.filter(filtering_kwargs).exclude(id=event.id)` – wisvem Dec 19 '22 at 15:30
  • i've edited the answer you can now use `event_filters=IOCInfo.objects.filter(event=event).values(iocs_ioc_type=F("ioc_type"),iocs_value=F("value"))` – Dante Dec 20 '22 at 06:32
  • Tried and the avg is `0,007 seconds`, for now Egor proposal is the best one. Thank you very much for your time and fr the explanation – wisvem Dec 21 '22 at 14:40
  • I appreciate... @egor-wexler's answer is unbelievable. – Dante Dec 21 '22 at 15:47