How to go from list of words to a list of distinct letters in Python

Question

Using Python, I'm trying to convert a sentence of words into a flat list of all distinct letters in that sentence.

Here's my current code:

words = 'She sells seashells by the seashore'

ltr = []

# Convert the string that is "words" to a list of its component words
word_list = [x.strip().lower() for x in words.split(' ')]

# Now convert the list of component words to a distinct list of
# all letters encountered.
for word in word_list:
    for c in word:
        if c not in ltr:
            ltr.append(c)

print ltr

This code returns ['s', 'h', 'e', 'l', 'a', 'b', 'y', 't', 'o', 'r'], which is correct, but is there a more Pythonic way to this answer, probably using list comprehensions/set?

When I try to combine list-comprehension nesting and filtering, I get lists of lists instead of a flat list.

The order of the distinct letters in the final list (ltr) is not important; what's crucial is that they be unique.

+1 for a well-worded question and for including your attempt (what a welcome sight!) — mechanical_meat, Feb 11 '10 at 16:35
Note that `str.split` defaults to splitting on whitespace, so `.split(' ')` is usually spelled `.split()`. — Mike Graham, Feb 11 '10 at 16:54

score 13 · Accepted Answer · answered Feb 11 '10 at 16:53

13

Sets provide a simple, efficient solution.

words = 'She sells seashells by the seashore'

unique_letters = set(words.lower())
unique_letters.discard(' ') # If there was a space, remove it.

answered Feb 11 '10 at 16:53

Mike Graham

73,987
14
101
130

pretty much the same as ephemient, but your answer if formatted nicer. – tgray Feb 11 '10 at 17:02
@tgray - this was also posted first; ephemient edited his in afterwards (see edit logs). – danben Feb 11 '10 at 17:04
@tgray: it's not about formatting, it's about efficiently applying `.lower`! – SilentGhost Feb 11 '10 at 17:04
I'd make it one line and more general with unique_letters = set(c for c in words.lower() if c.isalpha()) – job Feb 11 '10 at 17:04
Just the kind of shorthand answer I was looking for! Thanks! – Art Metzer Feb 11 '10 at 17:05
1

@job: Why though? This solution is cleaner, even if it is 2 lines *and* it doesn't run a comparison on every letter. – Jason Coon Feb 11 '10 at 17:21
1

I didn't notice this when I edited my answer, but in any case, Mike's answer is cleaner. +1 – ephemient Feb 11 '10 at 17:53

danben · Answer 2 · 2010-02-11T16:43:37.053

3

set([letter.lower() for letter in words if letter != ' '])

Edit: I just tried it and found this will also work (maybe this is what SilentGhost was referring to):

set(letter.lower() for letter in words if letter != ' ')

And if you need to have a list rather than a set, you can

list(set(letter.lower() for letter in words if letter != ' '))

edited Feb 11 '10 at 16:43

answered Feb 11 '10 at 16:31

danben

80,905
18
123
145

1

you don't list comprehension there – SilentGhost Feb 11 '10 at 16:39
@Art Metzer: Note that splitting isn't really necessary, since a string is an iterable type. This is probably why you were getting lists of lists when you didn't want them. – danben Feb 11 '10 at 16:40
@SilentGhost: I don't understand your comment, can you clarify? – danben Feb 11 '10 at 16:41
@danben: you can use a generator expression to avoid creating a new list. Like in Ignacio's answer. – mechanical_meat Feb 11 '10 at 16:42
you're using list comprehension, you don't have to. – SilentGhost Feb 11 '10 at 16:43

score 3 · Answer 3 · answered Feb 11 '10 at 16:32

3

Make ltr a set and change your loop body a little:

ltr = set()

for word in word_list:
    for c in word:
       ltr.add(c)

Or using a list comprehension:

ltr = set([c for word in word_list for c in word])

answered Feb 11 '10 at 16:32

Eli Bendersky

263,248
89
350
412

ephemient · Answer 4 · 2010-02-11T16:54:12.283

>>> set('She sells seashells by the seashore'.replace(' ', '').lower())
set(['a', 'b', 'e', 'h', 'l', 'o', 's', 'r', 't', 'y'])
>>> set(c.lower() for c in 'She sells seashells by the seashore' if not c.isspace())
set(['a', 'b', 'e', 'h', 'l', 'o', 's', 'r', 't', 'y'])
>>> from itertools import chain
>>> set(chain(*'She sells seashells by the seashore'.lower().split()))
set(['a', 'b', 'e', 'h', 'l', 'o', 's', 'r', 't', 'y'])

SilentGhost · Answer 5 · 2010-02-11T16:57:10.017

here are some timings made with py3k:

>>> import timeit
>>> def t():                    # mine (see history)
    a = {i.lower() for i in words}
    a.discard(' ')
    return a

>>> timeit.timeit(t)
7.993071812372081
>>> def b():                    # danben
    return set(letter.lower() for letter in words if letter != ' ')

>>> timeit.timeit(b)
9.982847967921138
>>> def c():                    # ephemient in comment
    return {i.lower() for i in words if i != ' '}

>>> timeit.timeit(c)
8.241267610375516
>>> def d():                    #Mike Graham
    a = set(words.lower())
    a.discard(' ')
    return a

>>> timeit.timeit(d)
2.7693045186082372

@ephemient: it works, but it's a bit slower. and btw, `set(i.lower for i in words if i != ' ')` version is 20% slower in py3k — SilentGhost, Feb 11 '10 at 16:51

score 0 · Answer 6 · answered Feb 11 '10 at 16:35

0

set(l for w in word_list for l in w)

answered Feb 11 '10 at 16:35

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

@danben: You *did* notice that `word_list` is already split by spaces, right? – Ignacio Vazquez-Abrams Feb 11 '10 at 16:36

score 0 · Answer 7 · answered Feb 11 '10 at 17:01

0

words = 'She sells seashells by the seashore'

ltr = list(set(list(words.lower())))
ltr.remove(' ')
print ltr

answered Feb 11 '10 at 17:01

ZenGyro

176
1
6

oh, dear, oh dear, oh dear... here is the useful read: http://docs.python.org/reference/datamodel.html#the-standard-type-hierarchy – SilentGhost Feb 11 '10 at 17:03

How to go from list of words to a list of distinct letters in Python

7 Answers7