8

Let's say I have a list of strings:

a = ['a', 'a', 'b', 'c', 'c', 'c', 'd']

I want to make a list of items that appear at least twice in a row:

result = ['a', 'c']

I know I have to use a for loop, but I can't figure out how to target the items repeated in a row. How can I do so?

EDIT: What if the same item repeats twice in a? Then the set function would be ineffective

a = ['a', 'b', 'a', 'a', 'c', 'a', 'a', 'a', 'd', 'd']
result = ['a', 'a', 'd']
Albert
  • 505
  • 1
  • 5
  • 7

9 Answers9

7

try itertools.groupby() here:

>>> from itertools import groupby,islice
>>> a = ['a', 'a', 'b', 'c', 'c', 'c', 'b']

>>> [list(g) for k,g in groupby(a)]
[['a', 'a'], ['b'], ['c', 'c', 'c'], ['b']] 

>>> [k for k,g in groupby(a) if len(list(g))>=2]
['a', 'c']

using islice() :

>>> [k for k,g in groupby(a) if len(list(islice(g,0,2)))==2]
>>> ['a', 'c']

using zip() and izip():

In [198]: set(x[0] for x in izip(a,a[1:]) if x[0]==x[1])
Out[198]: set(['a', 'c'])

In [199]: set(x[0] for x in zip(a,a[1:]) if x[0]==x[1])
Out[199]: set(['a', 'c'])

timeit results:

from itertools import *

a='aaaabbbccccddddefgggghhhhhiiiiiijjjkkklllmnooooooppppppppqqqqqqsssstuuvv'

def grp_isl():
    [k for k,g in groupby(a) if len(list(islice(g,0,2)))==2]

def grpby():
    [k for k,g in groupby(a) if len(list(g))>=2]

def chn():
    set(x[1] for x in chain(izip(*([iter(a)] * 2)), izip(*([iter(a[1:])] * 2))) if x[0] == x[1])

def dread():
    set(a[i] for i in range(1, len(a)) if a[i] == a[i-1])

def xdread():
    set(a[i] for i in xrange(1, len(a)) if a[i] == a[i-1])

def inrow():
    inRow = []
    last = None
    for x in a:
        if last == x and (len(inRow) == 0 or inRow[-1] != x):
            inRow.append(last)
        last = x

def zipp():
    set(x[0] for x in zip(a,a[1:]) if x[0]==x[1])

def izipp():
    set(x[0] for x in izip(a,a[1:]) if x[0]==x[1])

if __name__=="__main__":
    import timeit
    print "islice",timeit.timeit("grp_isl()", setup="from __main__ import grp_isl")
    print "grpby",timeit.timeit("grpby()", setup="from __main__ import grpby")
    print "dread",timeit.timeit("dread()", setup="from __main__ import dread")
    print "xdread",timeit.timeit("xdread()", setup="from __main__ import xdread")
    print "chain",timeit.timeit("chn()", setup="from __main__ import chn")
    print "inrow",timeit.timeit("inrow()", setup="from __main__ import inrow")
    print "zip",timeit.timeit("zipp()", setup="from __main__ import zipp")
    print "izip",timeit.timeit("izipp()", setup="from __main__ import izipp")

output:

islice 39.9123107277
grpby 30.1204478987
dread 17.8041124706
xdread 15.3691785568
chain 17.4777339702
inrow 11.8577565327           
zip 16.6348844045
izip 15.1468557105

Conclusion:

Poke's solution is the fastest solution in comparison to other alternatives.

Community
  • 1
  • 1
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • 1
    I think you missed the "in a row" part. Nice simple solution otherwise, even if it is O(n^2). – Mark Ransom Oct 22 '12 at 22:14
  • Counter still doesn't do "in a row". – g.d.d.c Oct 22 '12 at 22:18
  • @g.d.d.c what's "in a row"? I don't get it. :| – Ashwini Chaudhary Oct 22 '12 at 22:20
  • The element repeated over-and-over, so `f(a a a b c c c c b)` would return only `a c`. – Blender Oct 22 '12 at 22:21
  • Your answer would also match `a = ['a', 'b', 'a', 'c', 'c', 'c', 'd']` which would be wrong according to the OP. The repeated elements need to be in consecutive positions in the original list. – g.d.d.c Oct 22 '12 at 22:22
  • @Blender @g.d.d.c I guess the OP needs `groupby()` then. solution edited. – Ashwini Chaudhary Oct 22 '12 at 22:28
  • @AshwiniChaudhary Can you add @DRead's solution with `xrange` in place of `range`? It's faster and theoretically has a smaller memory footprint. – kreativitea Oct 23 '12 at 00:04
  • @kreativitea added ,but weirdly it's slower than `range()`, `izip()` came out to be the fastest. – Ashwini Chaudhary Oct 23 '12 at 00:16
  • @AshwiniChaudhary That's weird, `xrange` beats `range` on my machine-- 1.6109 vs 1.4546 – kreativitea Oct 23 '12 at 00:37
  • @AshwiniChaudhary Thanks for your detailed answer! What if a = ['r', 'g', 'r', 'r', 'p', 'r', 'r', 'r'], so that f(a) would return [r, r]. The set function would inhibit the latter row of the repeated element. – Albert Oct 23 '12 at 01:26
  • For fairness, you should have added my version too, because at least on my computer it’s about 15% faster… and probably a lot more readable than some of those other solutions. – poke Oct 23 '12 at 16:45
  • @poke you're quite right, I timed these solution again and your solution came out to be the fastest one. So a +1 to our solution. – Ashwini Chaudhary Oct 23 '12 at 18:12
5

This sounds like homework, so I'll just outline what I would do:

  1. Iterate over a, but keep the index of each element in a variable. enumerate() will be useful.
  2. Inside of your for loop, start a while loop from the current item's index.
  3. Repeat the loop as long as the next element is the same as the previous (or the original). break will be useful here.
  4. Count the number of times that loop repeats (you'll need some counter variable for this).
  5. Append the item to your result if your counter variable is >= 2.
Blender
  • 289,723
  • 53
  • 439
  • 496
3

My take:

>>> a = ['a', 'a', 'b', 'c', 'c', 'c', 'd']
>>> inRow = []
>>> last = None
>>> for x in a:
        if last == x and (len(inRow) == 0 or inRow[-1] != x):
            inRow.append(last)
        last = x
>>> inRow
['a', 'c']
poke
  • 369,085
  • 72
  • 557
  • 602
3

How about:

set([a[i] for i in range(1, len(a)) if a[i] == a[i-1]])

D Read
  • 3,175
  • 1
  • 15
  • 25
  • 1
    This is easily the most elegant solution. I would use `xrange` over `range` for this, though, you will save a bit of time and a bunch of memory. – kreativitea Oct 22 '12 at 23:54
2

Here's a Python one-liner that will do what I think you want. It uses the itertools package:

from itertools import chain, izip

a = "aabbbdeefggh" 

set(x[1] for x in chain(izip(*([iter(a)] * 2)), izip(*([iter(a[1:])] * 2))) if x[0] == x[1])
Will
  • 4,585
  • 1
  • 26
  • 48
  • this gave me incorrect output for `a="aabbbdeefggh"`, I expected `{'a', 'b', 'e', 'g'}` but got `{'a', 'b', 'e'}`. – Ashwini Chaudhary Oct 22 '12 at 22:44
  • I can see why this is happening. It is only creating 'even' pairs. This could be solved by repeating the process with odd pairs as well. – Will Oct 22 '12 at 22:48
  • Updated now to solve that problem. There is still an edge case, but it's a one liner :-) – Will Oct 22 '12 at 22:54
  • I updated it to get rid of it. It was when the list was odd and the iterator 'wrapped' onto the second pass and paired the last element from the 'even' list with the first element from the 'odd' list. – Will Oct 22 '12 at 23:11
  • +1 I just timed your solution , and it's quite faster than `groupby()`, posting the `timeit` result in my solution. – Ashwini Chaudhary Oct 22 '12 at 23:46
1

The edited question asks to avoid the set(), ruling out most of the answers.

I thought I'd compare the fancy one-liner list comprehensions with the good-old loop from @poke and another I created:

from itertools import *

a = 'aaaabbbccccaaaaefgggghhhhhiiiiiijjjkkklllmnooooooaaaaaaaaqqqqqqsssstuuvv'

def izipp():
    return set(x[0] for x in izip(a, a[1:]) if x[0] == x[1])

def grpby():
    return [k for k,g in groupby(a) if len(list(g))>=2]

def poke():
    inRow = []
    last = None
    for x in a:
        if last == x and (len(inRow) == 0 or inRow[-1] != x):
            inRow.append(last)
        last = x
    return inRow    

def dread2():
    repeated_chars = []
    previous_char = ''
    for char in a:
        if repeated_chars and char == repeated_chars[-1]:
            continue
        if char == previous_char:
            repeated_chars.append(char)
        else:
            previous_char = char
    return repeated_chars

if __name__=="__main__":
    import timeit
    print "izip",timeit.timeit("izipp()", setup="from __main__ import izipp"),''.join(izipp())
    print "grpby",timeit.timeit("grpby()", setup="from __main__ import grpby"),''.join(grpby())
    print "poke",timeit.timeit("poke()", setup="from __main__ import poke"),''.join(poke())
    print "dread2",timeit.timeit("dread2()", setup="from __main__ import dread2"),''.join(dread2())

Gives me results:

izip 13.2173779011 acbgihkjloqsuv
grpby 18.1190848351 abcaghijkloaqsuv
poke 11.8500328064 abcaghijkloaqsuv
dread2 9.0088801384 abcaghijkloaqsuv

So a basic loop seems faster than all the list comprehensions and as much as twice the speed of the groupby. However the basic loops are more complicated to read and write, so I'd probably stick with the groupby() in most circumstances.

D Read
  • 3,175
  • 1
  • 15
  • 25
  • poke congratulations for your solution. @AshwiniChaudhary was wrong though to say yours was fastest, as dread2 is faster! – D Read Oct 24 '12 at 10:18
  • Indeed, but yours is still based on mine. But it’s interesting to see how such few changes can actually further improve the performance, so good job for that ;) – poke Oct 24 '12 at 10:57
  • Actually I didn't read yours before writing mine - the variables didn't make much sense, but it's such a simple problem so I understand the similarity. I'm just annoyed that my contribution is 25% faster than the next best one, yet that is not reflected in the answer ratings. – D Read Oct 24 '12 at 16:10
0

Here's a regex one-liner:

>>> mylist = ['a', 'a', 'b', 'c', 'c', 'c', 'd', 'a', 'a']
>>> results = [match[0][0] for match in re.findall(r'((\w)\2{1,})', ''.join(mylist))]
>>> results
['a', 'c', 'a']

Sorry, too lazy to time it.

verbsintransit
  • 888
  • 3
  • 8
  • 18
0
a = ['a', 'a', 'b', 'c', 'c', 'c', 'd']
res=[]
for i in a:
    if a.count(i)>1 and i not in res:
        res.append(i)
print(res)
raton
  • 418
  • 5
  • 14
0

Using enumerate to check for two in a row:

def repetitives(long_list)
  repeaters = []
  for counter,item in enumerate(long_list):
    if item == long_list[counter-1] and item not in repeaters:
      repeaters.append(item)
 return repeaters
Brownbat
  • 266
  • 1
  • 8