11

I am presented with a list made entirely of tuples, such as:

lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")]

How can I split lst into as many lists as there are colours? In this case, 3 lists

[("hello", "Blue"), ("hey", "Blue")]
[("hi", "Red")]
[("yo", "Green")]

I just need to be able to work with these lists later, so I don't want to just output them to screen.

Details about the list

I know that every element of lst is strictly a double-element tuple. The colour is also always going to be that second element of each tuple.

The problem

Problem is,lst is dependant on user input, so I won't always know how many colours there are in total and what they are. That is why I couldn't predefine variables to store these lists in them.

So how can this be done?

Georgy
  • 12,464
  • 7
  • 65
  • 73
TGamer
  • 529
  • 1
  • 9
  • 26

6 Answers6

9

You could use a collections.defaultdict to group by colour:

from collections import defaultdict

lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")]

colours = defaultdict(list)
for word, colour in lst:
    colours[colour].append((word, colour))

print(colours)
# defaultdict(<class 'list'>, {'Blue': [('hello', 'Blue'), ('hey', 'Blue')], 'Red': [('hi', 'Red')], 'Green': [('yo', 'Green')]})

Or if you prefer using no libraries, dict.setdefault is an option:

colours = {}
for word, colour in lst:
      colours.setdefault(colour, []).append((word, colour))

print(colours)
# {'Blue': [('hello', 'Blue'), ('hey', 'Blue')], 'Red': [('hi', 'Red')], 'Green': [('yo', 'Green')]}

If you just want the colour tuples separated into nested lists of tuples, print the values() as a list:

print(list(colours.values()))
# [[('hello', 'Blue'), ('hey', 'Blue')], [('hi', 'Red')], [('yo', 'Green')]]

Benefit of the above approaches is they automatically initialize empty lists for new keys as you add them, so you don't have to do that yourself.

RoadRunner
  • 25,803
  • 6
  • 42
  • 75
5

This can be done relatively efficiently with a supporting dict:

def split_by_idx(items, idx=1):
    result = {}
    for item in items:
        key = item[idx]
        if key not in result:
            result[key] = []
        result[key].append(item)
    return result

and the lists can be collected from result with dict.values():

lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")]


d = split_by_idx(lst)
print(list(d.values()))
# [[('hello', 'Blue'), ('hey', 'Blue')], [('hi', 'Red')], [('yo', 'Green')]]

This could be implemented also with dict.setdefault() or a defaultdict which are fundamentally the same except that you do not explicitly have to handle the "key not present" case:

def split_by_idx_sd(items, idx=1):
    result = {}
    for item in items:
        result.setdefault(item[idx], []).append(item)
    return result
import collections


def split_by_idx_dd(items, idx=1):
    result = collections.defaultdict(list)
    for item in items:
        result[item[idx]].append(item)
    return result

Timewise, the dict-based solution is the fastest for your input:

%timeit split_by_idx(lst)
# 1000000 loops, best of 3: 776 ns per loop
%timeit split_by_idx_sd(lst)
# 1000000 loops, best of 3: 866 ns per loop
%timeit split_by_idx_dd(lst)
# 1000000 loops, best of 3: 1.16 µs per loop

but you would get different timings depending on the "collision rate" of your input. In general, you should expect split_by_idx() to be the fastest with low collision rate (i.e. most of the entries create a new element of the dict), while split_by_idx_dd() should be fastest for high collision rate (i.e. most of the entries get appended to existing defaultdict key).

norok2
  • 25,683
  • 4
  • 73
  • 99
3

In my opinion, best would be you use defaultdict from collections

from collections import defaultdict
colors = defaultdict(list)
for word, color in lst:
    colors[color].append(word)

this will give you better data-structure

>>> colors
defaultdict(list, {'Blue': ['hello', 'hey'], 'Green': ['yo'], 'Red': ['hi']})

for example, you can work with this as:

>>> for key, values in colors.items():
...     print([[key, value] for value in values])
...     
[['Blue', 'hello'], ['Blue', 'hey']]
[['Red', 'hi']]
[['Green', 'yo']]
wjandrea
  • 28,235
  • 9
  • 60
  • 81
Grijesh Chauhan
  • 57,103
  • 20
  • 141
  • 208
3
from itertools import groupby
from operator import itemgetter

indexer = itemgetter(1)
desired = [list(gr) for _, gr in groupby(sorted(lst, key=indexer), key=indexer)]
# [[('hello', 'Blue'), ('hey', 'Blue')], [('yo', 'Green')], [('hi', 'Red')]]

We sort the list based on first items of tuples and then group them based on first items of tuples. There is a repetition of "based on first items", hence the indexer variable.

Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38
2

You can do this (python 3):

lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")]
colors = {elem[1] for elem in lst}  # make set of colors
colors = dict.fromkeys(colors, [])  # turn the set of colors into dict

for t in lst:
    colors[t[1]] = [*colors[t[1]], t]

If you just want the color tuples, you can print the values() of colors dict:

print(list(colors.values()))
# [[('hello', 'Blue'), ('hey', 'Blue')], [('hi', 'Red')], [('yo', 'Green')]]
Fuzz Norman
  • 57
  • 1
  • 6
-1

You can do this:

lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")]
colors = {elem[1] for elem in lst}

lsts = []

for color in colors:
    color_lst = [elem for elem in lst if elem[1] == color]
    lsts.append(color_lst)

colors contains the unique colors present in lst (set comprehension ensures this uniqueness), and lsts will contain the final 3 lists you need.

Here is what lsts ends up being: [[('hi', 'Red')], [('yo', 'Green')], [('hello', 'Blue'), ('hey', 'Blue')]].

Mario Ishac
  • 5,060
  • 3
  • 21
  • 52