If you maintain your list
in sorted order (which ideally means few insertions or removals, because mid-list
insertion/removal is itself O(n)
), you can find the set of Person
s below a given salary with the bisect
module.
from bisect import bisect
from operator import attrgetter
# if to store objects in a list...
collection = [Person("Jack", 50000, 0), ..., Person("Jane", 120000, 1)]
collection.sort(key=attrgetter('salary')) # O(n log n) initial sort
# filtering searches in O(log n):
target = 100000
filtered_collection = collection[:bisect(collection, target, key=attrgetter('salary'))]
Note: The key
argument to the various bisect
module functions is only supported as of 3.10. In prior versions, you'd need to define the rich comparison operators for Person
in terms of salary
and search for a faked out Person
object, or maintain ugly separate sorted list
s, one of salary
alone, and a parallel list
of the Person
objects.
For adding individual elements to the collection
, you could use bisect
's insort
function. Or you could just add a bunch of items to the end of the list
in bulk and resort it on the same key
as before (Python's sorting algorithm, TimSort, gets near O(n)
performance when the collection is mostly in order already, so the cost is not as high as you might think).
I'll note that in practice, this sort of scenario (massive data that can be arbitrarily ordered by multiple fields) usually calls for a database; you might consider using sqlite3
(eventually switching to a more production-grade database like MySQL or PostGres if needed), which, with appropriate indexes defined, would let you do O(log n)
SELECT
s on any indexed field; you could convert to Person
objects on extraction for the data you actually need to work with. The B-trees that true DBMS solutions provide get you O(log n)
effort for inserts, deletes and selects on the index fields, where Python built-in collection types make you choose; only one of insertion/deletion or searching can be truly O(log n)
, while the other is O(n)
.