The obvious thing to do is either sort then filter, or filter then sort.
If you have the same list every time, sorting first is obviously a win, because then you only need to sort once instead of every time. It also means you can use a binary search for the filtering instead of a linear walk (as explained in ventsyv's answer—although that probably won't pay off unless your lists are much longer than this one.
If you have different lists every time, filtering first is probably a win, because the sort is probably the slow part, and you're sorting a smaller list that way.
But let's stop speculating and start testing.
Using a list of several thousand floats, about half of which are in range:
In [1591]: flist = [random.random()*10 for _ in range(5000)]
In [1592]: %timeit sorted(x for x in flist if 3 <= x < 8)
100 loops, best of 3: 3.12 ms per loop
In [1593]: %timeit [x for x in sorted(flist) if 3 <= x < 8]
100 loops, best of 3: 4 ms per loop
In [1594]: %timeit l=sorted(flist); l[bisect.bisect_left(l, 3):bisect.bisect_right(l, 8)]
100 loops, best of 3: 3.36 ms per loop
So, filtering then sorting wins; ventsyn's algorithm does make up for part of the difference, but not all of it. But course if we only have a single list to sort, sorting it once instead of thousands of times is an obvious win:
In [1596]: l = sorted(flist)
In [1597]: %timeit l[bisect.bisect_left(l, 3):bisect.bisect_right(l, 8)]
10000 loops, best of 3: 29.2 µs per loop
So, if you have the same list over and over, obviously sort it once.
Otherwise, you could test on your real data… but we're talking about shaving up to 22% off of something that takes milliseconds. Even if you do it many thousands of times, that's saving you under a second. Just the cost of typing the different implementations—much less understanding them, generalizing them, debugging them, and performance testing them—is more than that.
But really, if you're doing millions of operations spread over hundreds of thousands of values, and speed is important, you shouldn't be using a list in the first place, you should be using a NumPy array. NumPy can store just the raw float
values, without boxing them up as Python objects. Besides saving memory (and improving cache locality), this means that the inner loop in, say, np.sort
is faster than the inner loop in sorted
, because it doesn't have to make a Python function call that ultimately involves unboxing two numbers, it just has to do a comparison directly.
Assuming you're storing your values in an array in the first place, how does it stack up?
In [1607]: flist = np.random.random(5000) * 10
In [1608]: %timeit a = np.sort(flist); a = a[3 <= a]; a = a[a < 8]
1000 loops, best of 3: 742 µs per loop
In [1611]: %timeit c = b[3 <= b]; d = c[c < 8]
10000 loops, best of 3: 29.8 µs per loop
So, it's about 4x faster than filter-and-sort for the "different lists" case, even using a clunky algorithm (I was looking for something I could cram onto one %timeit
line, rather than the fastest or most readable…). And for the "same list over and over" case, it's almost as fast as the bisect solution even without bisecting (but of course you can bisect with NumPy, too).