1

I'm wondering whether there's a more pythonic way to get an interval from a list knowing the beginning and ending values while only traversing the list once.

Example of what I want in a not very pythonic manner (stores all names between 'Ann' and 'John' inclusive):

all_names = []
start_adding = False

for name in names:
    if name == 'Ann':
        start_adding = True
    if start_adding:
        all_names.append(name)
    if name == 'John':
        break
confused00
  • 2,556
  • 21
  • 39

6 Answers6

4

That solution traverses the list only once. And it uses only one expression and functions from standard library! :)

import itertools as it

l = ['Bill', 'Patrick', 'Aaron', 'Ann',
     'Jane', 'Rachel', 'Beatrix', 
     'John', 'Basil', 'Alice', ]
l = iter(l)

print(
    list(
        it.chain(
            it.dropwhile(lambda _: True, iter(lambda: next(l), 'Ann')),
            iter(lambda: next(l), 'John')
        )
    )
)

Output:

['Jane', 'Rachel', 'Beatrix']

Also - demo: http://ideone.com/eOLG6o

Gill Bates
  • 14,330
  • 23
  • 70
  • 138
  • This is a really great answer. You could probably drop all the surrounding stuff (print, list etc.) or move it into a demo so people can quickly get the gist of it. – Asad Saeeduddin May 08 '15 at 09:15
  • This answer is best fitted for my question as I laid it in OP as it uses a lot of python specific ways, but it's worth noting that it's also the slowest among the ones posted here (from the testing I've done) – confused00 May 08 '15 at 09:47
  • @confused00 It's actually pretty crazy solution, don't use it in production, it will confuse people. – Gill Bates May 08 '15 at 09:59
3

I don't know about Pythonic, but here's a generator that will traverse the list only once and produce the middle values.

def get_between(names, first, last):
    f = l = False
    for n in names:
        l = l or n == last
        if f and not l:
            yield n
        f = f or n == first

It just takes the naive approach of remembering whether it has seen the first and last names you're interested in, and returning values when the the first has been seen but the last hasn't. You could probably add some early exiting to make it better.

Here's a demo: http://ideone.com/ovnMX2

Asad Saeeduddin
  • 46,193
  • 6
  • 90
  • 139
2

Here's a bit more verbose, but more pythonic (and, theoretically, more performant) way to do that: generators and yield

def between_generator(list, start, end):
    yield_item = False
    for item in list:
        if item == start:
            yield_item = True
        if yield_item:
            yield item
        if item == end:
            break  # or raise StopIteration

# usage
for item in between_generator(list, start, end):
    print item

# converting to list for multiple use
items = list(between_generator(list, start, end))

This basically creates a lightweight one-way cursor above the list. Iterating over it will yield all the items between start and end. In order to use the results of filtering multiple times, they can be fed into list constructor to create new list.

You might want to consult a question about generators here, on SO for more explanations.

Community
  • 1
  • 1
J0HN
  • 26,063
  • 5
  • 54
  • 85
1

Yes there is ;-)

with the index function (docs)

r = range(10)
start = r.index(3)
end = r.index(7)
sub_list = r[start:end]

print sub_list
# [3, 4, 5, 6]

# if you want to include the start and end values
sub_list2 = r[start-1:end+1]

print sub_list2
# [2, 3, 4, 5, 6, 7]
yamm
  • 1,523
  • 1
  • 15
  • 25
  • 1
    Isn't this going through the list twice? (for each `index()` method) – confused00 May 08 '15 at 08:55
  • 1
    @confused00 index is a built in fuction which is a lot faster then looping though the list like you did. then parsing the list with [start:end] is a lot more efficient then to append each value to a list. i tryed to do a benchmark with your code but it seems like it is an endless loop. – yamm May 08 '15 at 09:07
  • @confused00 i did a quick benchmark and my solution takes `818 ns` while the acceptes answer takes `4.15 µs` --> `4150 ns` so the build in functions are 5.1 times faster even though i called the list twice ;-). your solution takes `2.29 µs` --> `2290 ns` so 2.8 time slower then mine but still faster then the 'pythonic' version. i would argue that my solution is not only the fastest but also the most pythonic because its readable. – yamm May 26 '15 at 09:14
  • it also ignores my point in the OP mentioning "while only traversing the list once" – confused00 May 26 '15 at 11:07
  • @confused00 i honestly dont know what the builtin index does. maybe it doesnt even traverse the list. – yamm May 26 '15 at 13:22
1

This is more compact and I feel it's more readable

>>> x = ['abc', 'ann', 'elsa', 'silva', 'john', 'carlos', 'michel']
>>> x[x.index('ann'): x.index('john') + 1]
['ann', 'elsa', 'silva', 'john']
Davide
  • 301
  • 1
  • 8
  • Again, this traverses the list twice. – confused00 May 08 '15 at 08:57
  • Every possible solution to your problem will have to go through the list at least once in the worst case, giving you `O(n)` worst case complexity. So traversing once, twice, three times really won't make much diference. Also you asked for a "pythonic" way of doing things, not a super optimized way... If you're downvoting all these valid solutions, you should really stop it... – Davide May 08 '15 at 09:03
  • I said in the OP I want it to only traverse it once, I haven't downvoted anyone, and theory of complexity doesn't apply in practice the same as in theory. Traversing twice still takes twice the time, and my list is huge. – confused00 May 08 '15 at 09:07
  • 1
    Complexity theory applies exactly to huge lists. That's what it's for. However, list.index is probably implemented in C (like a lot of other list stuff in python), so two calls to index e probably faster than traversing the list just once in python. Only way to find out is by testing it. – Davide May 08 '15 at 09:15
1

You could do something like

def between(l, start, end):
    first_index = l.index(start)
    return l[first_index:][:l.index(end, first_index) - first_index + 1]

That doesn't really look nice, but should avoid dual traversal (assuming the slice operation does not copy).

Edit: OP noted that the index position must be augmented to suit the subslice.

dhke
  • 15,008
  • 2
  • 39
  • 56
  • Does it avoid dual traversal? It still uses the `index()` method twice. Isn't a traversal done for each call? – confused00 May 08 '15 at 09:04
  • `index()` takes a second `start` argument which indicates the starting position of the search. The second traverse thus skips the initial part of the list. I'm not so sure, however, if there isn't a hidden copy of the list created somewhere in there. – dhke May 08 '15 at 09:47
  • 1
    I see, makes sense. I tried to test it but to get it working I had to change the last line to `return l[first_index:][:l.index(end, first_index) - first_index + 1]`. From the ones I've tested, yours is the fastest so I'll use this (also very compact). Sorry for not accepting your answer, I just think I phrased the question badly and the accepted answer fits best what I asked for in the OP. – confused00 May 08 '15 at 10:00
  • You are right about the change, I missed the point that the index position is for the original list and not the sliced version. – dhke May 08 '15 at 10:16