3

So basically for example of you have a list like:

l = ['a','b','a','b','c','c']

The output should be:

[['a','a'],['b','b'],['c','c']]

So basically put together the values that are duplicated into a list,

I tried:

l = ['a','b','a','b','c','c']
it=iter(sorted(l))
next(it)
new_l=[]
for i in sorted(l):
   new_l.append([])
   if next(it,None)==i:
      new_l[-1].append(i)
   else:
      new_l.append([])

But doesn't work, and if it does work it is not gonna be efficient

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
U13-Forward
  • 69,221
  • 14
  • 89
  • 114

8 Answers8

4

Sort the list then use itertools.groupby:

>>> from itertools import groupby
>>> l = ['a','b','a','b','c','c']
>>> [list(g) for _, g in groupby(sorted(l))]
[['a', 'a'], ['b', 'b'], ['c', 'c']]

EDIT: this is probably not the fastest approach, sorting is O(n log n) time complexity for the average case and not required for all solutions (see the comments)

Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
  • 1
    This requires an average time complexity of O(n log n), however. – blhsing Oct 12 '18 at 08:54
  • 1
    @blhsing Yes, I know, I'm not actually sure this is the best solution it was just my first thought (one needs to be quick on SO), I will defer judgement to a `timeit` benchmark – Chris_Rands Oct 12 '18 at 08:56
  • 1
    @Chris_Rands It's known that Python's `sorted` function has an average time complexity of O(n log n). – blhsing Oct 12 '18 at 08:57
  • 1
    @blhsing yes you just said that, I agree :) – Chris_Rands Oct 12 '18 at 09:01
  • Accepted.., didn't realize `itertools.groupby` can do this much :-) – U13-Forward Oct 12 '18 at 09:06
  • 2
    @U9-Forward Thanks but I'm not convinced this is the best way, Austin or Blhsing's solutions might be faster, and will retain the order if the `OrderedCounter` recipe is added – Chris_Rands Oct 12 '18 at 09:08
  • @Chris_Rands or if the Python version remembers dict insertion order, i.e. 3.6 and above. – timgeb Oct 12 '18 at 09:10
  • @timgeb indeed or 3.7 and above for the guarantee across all python implementations – Chris_Rands Oct 12 '18 at 09:14
  • @timgeb BTW i think the reason I thought of this first and it seems so intuitive is it follow a typical command-line (unix) pattern of `sort | uniq -c` – Chris_Rands Oct 12 '18 at 09:22
4

You can use collections.Counter:

from collections import Counter
[[k] * c for k, c in Counter(l).items()]

This returns:

[['a', 'a'], ['b', 'b'], ['c', 'c']]

%%timeit comparison

  • Given a sample dataset of 100000 values, this answer is the fastest approach.

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
blhsing
  • 91,368
  • 6
  • 71
  • 106
4

Use collections.Counter:

from collections import Counter

l = ['a','b','a','b','c','c']
c = Counter(l)

print([[x] * y for x, y in c.items()])
# [['a', 'a'], ['b', 'b'], ['c', 'c']]
Austin
  • 25,759
  • 4
  • 25
  • 48
  • 1
    Works too, nice – U13-Forward Oct 12 '18 at 08:56
  • 3
    This is the best solution. Easy to read and does not require sorting (if you use a Python version where dicts remember insertion order). – timgeb Oct 12 '18 at 09:02
  • @timgeb Agreed! Although of course sorting and retaining the insertion order and not always going to produce the same output (although they do for this data); don't know what the OP wants actually for sure – Chris_Rands Oct 12 '18 at 09:25
1

Here's a functional solution via itertools.groupby. As it requires sorting, this will have time complexity O(n log n).

from itertools import groupby
from operator import itemgetter

L = ['a','b','a','b','c','c']

res = list(map(list, map(itemgetter(1), groupby(sorted(L)))))

[['a', 'a'], ['b', 'b'], ['c', 'c']]

The syntax is cumbersome since Python does not offer native function composition. This is supported by 3rd party library toolz:

from toolz import compose

foo = compose(list, itemgetter(1))
res = list(map(foo, groupby(sorted(L))))
jpp
  • 159,742
  • 34
  • 281
  • 339
1

Another approach is to use zip method.

l = ['a','b','a','b','c','c','b','c', 'a']
l = sorted(l)
grouped = [list(item) for item in list(zip(*[iter(l)] * l.count(l[0])))]

Output

[['a', 'a', 'a'], ['b', 'b', 'b'], ['c', 'c', 'c']]
Mihai Alexandru-Ionut
  • 47,092
  • 13
  • 101
  • 128
1

My solution using list comprehension would be (l is a list):

[l.count(x) * [x] for x in set(l)]
  • set(l) will retrieve all the element which appears in l, without duplicates
  • l.count(x) will return the number of times a specific element x appears in a given list l
  • the * operator creates a new list with the elements in a list (in this case, [x]) repeated the specified number of times (in this case, l.count(x) is the number of times)
nikeros
  • 3,302
  • 2
  • 10
  • 26
0
l = ['a','b','a','b','c','c']

want = []
for i in set(l):
    want.append(list(filter(lambda x: x == i, l)))
print(want)    
r.user.05apr
  • 5,356
  • 3
  • 22
  • 39
-1

Probably not the most efficient, but this is understandable:

l = ['a','b','a','b','c','c']
dict = {}
for i in l:
    if dict[i]:
        dict[i] += 1
    else:
         dict[i] = 1

new = []
for key in list(dict.keys()):
    new.append([key] * dict[key])
timgeb
  • 76,762
  • 20
  • 123
  • 145
DanDeg
  • 316
  • 1
  • 2
  • 7