0

I have a list of custom Python objects and need to search for the existence of specific objects within that list. My concern is the performance implications of searching large lists, especially frequently.

Here's a simplified example using a custom Person class with attributes name and age:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

people = [Person("Alice", 30), Person("Bob", 25), Person("Charlie", 35)]

Currently, I'm using a list comprehension and the any() function to check if a person with a specific name and age exists in the list:

if any(p.name == "Bob" and p.age == 25 for p in people):
    print("The person exists.")

Is there a more efficient way to search for the existence of specific custom objects within large Python lists?

SyntaxNavigator
  • 312
  • 3
  • 10
  • Why don't you use sets ? Lists are very bad in finding an element, since they are not designed for this. As such, if you keep using lists, you'll always get a linear search time. On the contrary, sets are especially designed to retrieve an element in `O(1)` operation. This implies creating a hash function for your class though. – RandomGuy Apr 26 '23 at 13:48
  • Can you create a set of `p.name` and search it if you just need a yes/no otherwise if you also need the object for the given name if it exists, I would build a dictionary with keys of `p.name` and values of `Person()` – JonSG Apr 26 '23 at 13:58
  • Honestly I'd throw it into a SQLite in memory database and use an ORM. A couple of indexes will speed things up, and even without indexes, searching in C is a win. – btilly Apr 26 '23 at 14:09
  • @RandomGuy If you implement \_\_hash__ don't you also need to implement \_\_eq__ ? – DarkKnight Apr 26 '23 at 15:13

5 Answers5

2

if the fields you are going to be filtering on are low, then you could first create a hash set of those fields. then knowing if a person named "bob" exists is O(1) instead of O(n) in case of list iteration

1

I mean technically speaking if equality checks are set up for it, you could create the Person object you're looking for and just see if it's in the list.

people = [Person("Alice", 30), Person("Bob", 25), Person("Charlie", 35)]

if Person("Bob", 25) in people:
    print("The person exists.")

This will only work if the Person object you use in your condition and the Person object in the list evaluate to the same object. So this solution may or may not work for you.

You could also stand to gain performance by using a set inistead of a list as stated in @RandomGuy's comment.

Edit:

If you want a more appropriate search you can use a set or a dict:

people_set = {(p.name, p.age) for p in people}
people_dict = {(p.name, p.age): p for p in people}

if ("Bob", 25) in people_set:
    print("Exists")

bob = people_dict[("Bob", 25)]
Jab
  • 26,853
  • 21
  • 75
  • 114
  • 1
    This does not answer the question, which was about efficiency. – RandomGuy Apr 26 '23 at 13:57
  • @RandomGuy This is a more efficient way I don't see how this doesn't answer the question – Jab Apr 26 '23 at 13:59
  • 1
    The performance of your solution is strictly equivalent to the solution of OP, so no it does not answer the question which is about efficiency for large lists. – RandomGuy Apr 26 '23 at 14:02
  • @RandomGuy I edited my answer to include set and dict solutions – Jab Apr 26 '23 at 14:16
1

One way to improve the performance of searching for specific objects within a large Python list is to use a dictionary instead of a list. This approach involves creating a dictionary where the keys are the attributes that you want to search by, and the values are the objects themselves.

Here's an example of how this approach can be applied to your Person class:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

people = [Person("Alice", 30), Person("Bob", 25), Person("Charlie", 35)]

people_dict = {}
for person in people:
    people_dict[(person.name, person.age)] = person

if ("Bob", 25) in people_dict:
    print("The person exists.")
robotiaga
  • 315
  • 2
  • 11
0

If we implement a couple of dunder methods in your class we can then use a set as follows:

class Person:
    def __init__(self, name, age):
        self._name = name
        self._age = age
    def _key(self):
        return self._name, self._age
    def __hash__(self):
        return hash(self._key())
    def __eq__(self, other):
        return isinstance(other, type(self)) and self._key() == other._key()

_set = {Person('Andy', age) for age in range(18, 100)}

print(Person('Andy', 55) in _set)

Output:

True
DarkKnight
  • 19,739
  • 3
  • 6
  • 22
0

If your Person() class is immutable you can use a frozen dataclass and a set:

from dataclasses import dataclass


@dataclass(frozen=True)
class Person:
    name: str
    age: int


people = {Person("Alice", 30), Person("Bob", 25), Person("Charlie", 35), Person("Mary", 47)}
print(people)
>>> {Person(name='Bob', age=25), Person(name='Charlie', age=35), Person(name='Alice', age=30), Person(name='Mary', age=47)}

print(f'{Person("Bob", 25) in people=}')
>>> Person("Bob", 25) in people=True

print(f'{Person("Mary", 44) in people=}')
>>> Person("Mary", 44) in people=False

But you will not be able to mutate them:

p = people.pop()
print(p)
>>> Person(name='Mary', age=47)

p.age = "99"
dataclasses.FrozenInstanceError: cannot assign to field 'age'
Terry Spotts
  • 3,527
  • 1
  • 8
  • 21