3

I have one question because I can not find a solution for my problem.

gen is a generator (result of difflib.Differ.compare()):

normally by iterating over gen I can read each line. The problem is that on each iteration I need to read the current line and the two next lines.

Example (normal output by iterating line by line):

iteration 1:
    line = 'a'
iteration 2:
    line = 'b'
iteration 3:
    line = 'c'
iteration 4:
    line = 'd'
iteration 5:
    line = 'e'
iteration 6:
    line = 'f'
iteration 7: 
    line = 'g'

but in my case I need to get this:

iteration 1:
    line = 'a'
    next1 = 'b'
    next2 = 'c'
iteration 2:
    line = 'b'
    next1 = 'c'
    next2 = 'd'
iteration 3:
    line = 'c'
    next1 = 'd'
    next2 = 'e'
iteration 4:
    line = 'd'
    next1 = 'e'
    next2 = 'f'
iteration 5:
    line = 'e'
    next1 = 'f'
    next2 = 'g'
iteration 6:
    line = 'f'
    next1 = 'g'
    next2 = None
iteration 7: 
    line = 'g'
    next1 = None
    next2 = None

I was trying to play with gen.send(), itertools.islice(), but I can not find the proper solution. I don't want to convert this generator into a list (then I could read next1 as gen[i + 1], next2 as gen[i + 2], but this is totally inefficient when the diff output is large.

user1880342
  • 128
  • 10

5 Answers5

6

This is what I'd suggest as a general solution for any iterator/generator. I think it's most efficient this way.

def genby3(gen):
    it = iter(gen) # Make it a separate iterator, to avoid consuming it totally
    L1 = it.next() # Get the first two elements
    L2 = it.next()
    for L3 in it:
        yield [L1, L2, L3] # Get the results grouped in 3
        L1, L2 = L2, L3 # Update the last 2 elements
    yield [L2, L3, None] # And take care of the last 2 cases
    yield [L3, None, None]

print list(genby3(xrange(10)))

If it was a file you were reading from, you could seek, readline then go back, but it might get messy, so you can treat it as any other iterator.

UPDATE: Made it work nicely for more than just 3 items per iteration, it works just as the other does.

def genby(gen, n):
    assert n>=1, 'This does not make sense with less than one element'
    it = iter(gen)
    last = list(it.next() for i in xrange(n-1))

    for nth_item in it:
        last = last+[nth_item]
        yield last
        last.pop(0)

    for i in xrange(n-1):
        last = last+[None]
        yield last
        last.pop(0)

r = xrange(10)
for i, n in enumerate(genby(r, 3)):
    print i, 'iteration'
    print '\t', n

Edit 2: Moved the concatenation of the lists before the yield statement, just to avoid having to make it twice. Slight improvement performance wise.

jadkik94
  • 7,000
  • 2
  • 30
  • 39
  • I have tried it and works as expected, but I wonder if this is the most efficient way of doing it. In my opinion the best way to achieve this suggested TorelTwiddler. – user1880342 Dec 05 '12 at 21:56
  • @user1880342 It's the same as mine, except mine is wrapped in a function that acts as a generator too. And the updated version works for more than only 3 items, but that might not be relevant to your use case. – jadkik94 Dec 06 '12 at 07:44
  • @user1880342 Also, it does not take care of the last few items with `None`s, but it would be straight-forward to do it from there. – jadkik94 Dec 06 '12 at 07:51
3

Try keeping temporary variables.

line = iterator.next()
next1 = iterator.next()

for next2 in iterator:
    #do stuff
    line = next1
    next1 = next2
TorelTwiddler
  • 5,996
  • 2
  • 32
  • 39
  • looking at this code it seems to be the most optimal way, isn't it? I have tried this example and seems to work as expected. – user1880342 Dec 05 '12 at 21:52
  • It's effectively the same thing as jadkik94's solution, except his is constructed to be it's own function. For simplicity and ease of use, I would recommend using his `genby3` function: `for x, y, z in genby3(xrange(10)):...`. – TorelTwiddler Dec 06 '12 at 20:45
2

There's a recipe in the itertools docs, pairwise(). It can be adapted:

from itertools import tee, izip_longest

def triplewise(iterable):
    xs, ys, zs = tee(iterable, 3)
    next(ys, None)
    next(zs, None)
    next(zs, None)
    return izip_longest(xs, ys, zs)

for line, next1, next2 in triplewise(gen):
    ...

It can also be generalized:

from itertools import tee, izip, izip_longest, islice

no_fillvalue = object()

def nwise(iterable, n=2, fillvalue=no_fillvalue):
    iters = (islice(each, i, None) for i, each in enumerate(tee(iterable, n)))
    if fillvalue is no_fillvalue:
        return izip(*iters)
    return izip_longest(*iters, fillvalue=fillvalue)

for line, next1, next2 in nwise(gen, 3, None):
    ...
pillmuncher
  • 10,094
  • 2
  • 35
  • 33
1

How about zipping the three sequences together?

izip_longest(gen, islice(gen,1,None), islice(gen,2,None), fillvalue=None)
ceyko
  • 4,822
  • 1
  • 18
  • 23
  • That doesn't work. The first loop generates the the first, third and sixth values generated by `gen` and discards the values in-between. – pillmuncher Dec 05 '12 at 21:23
1

You could use something like this:

def genTriplets(a):
    first = a.next()
    second = a.next()
    third = a.next()
    while True:
        yield (first, second, third)
        first = second
        second = third
        try:
            third = a.next()
        except StopIteration:
            third = None
            if (first is None and second is None and third is None):
                break
dckrooney
  • 3,041
  • 3
  • 22
  • 28
  • I have tried it and works as expected, but I wonder if this is the most efficient way of doing it. In my opinion the best way to achieve this suggested TorelTwiddler. – user1880342 Dec 05 '12 at 21:57
  • Yes, TorelTwiddler's approach is faster (though both are O(n)). However, wrapping your iteration logic in a generator allows you to separate it from the code invoking it; this could make later changes easier (such as accommodating variable window-widths, as jadkik94 implemented). Depending on your application, this might be worth the performance hit. – dckrooney Dec 05 '12 at 22:54