0

I have a list of names alphabetically, like:

list = ['ABC', 'ACE', 'BED', 'BRT', 'CCD', ..]

How can I get element from each starting letter? Do I have to iterate the list one time? or Does python has some function to do it? New to python, this may be a really naive problem.

Suppose I want to get the second element from names that starts from 'A', this case I get 'ACE'.

Qantas 94 Heavy
  • 15,750
  • 31
  • 68
  • 83
JudyJiang
  • 2,207
  • 6
  • 27
  • 47
  • 2
    `x for x in list if x[0] == "A"` something like that – Cory Kramer Mar 18 '14 at 13:18
  • 1
    What's the bigger picture? If you're going to use the results repeatedly, build a dictionary (or `collections.defaultdict(list)`): `d = {'A': ['ABC', 'ACE'], 'B': ['BED', 'BRT'], ...}`, then your query becomes `d['A'][1] == "ACE"` – jonrsharpe Mar 18 '14 at 13:30

6 Answers6

3

Using generator expression and itertools.islice:

>>> import itertools
>>> names = ['ABC', 'ACE', 'BED', 'BRT', 'CCD']
>>> next(itertools.islice((name for name in names if name.startswith('A')), 1, 2), 'no-such-name')
'ACE'

>>> names = ['ABC', 'BBD', 'BED', 'BRT', 'CCD']
>>> next(itertools.islice((name for name in names if name.startswith('A')), 1, 2), 'no-such-name')
'no-such-name'
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • that means I still have to iterate the list one time, right? my list contains like, 60,448 elements.... – JudyJiang Mar 18 '14 at 13:23
  • @JudyJiang, Yes you should iterate it. But this will stop as soon as it find the second matching element. – falsetru Mar 18 '14 at 13:31
  • @JudyJiang, If you do this kind of retrieving multiple times, you'd better to make a dictionary that maps the first alphabet to matching element lists once, then query the dictionary: `d['A'][1]` – falsetru Mar 18 '14 at 13:32
  • I just iterate it once and, say if I want 50 names, 2 names form 'A' and 2 from 'B', like that, 2 names each according to alphabet – JudyJiang Mar 18 '14 at 13:36
  • @JudyJiang, See the [jonrsharpe's answer](http://stackoverflow.com/a/22481069/2225682) that build a dictionary that maps the first character to the list. Also see a slightly different version that store at most 2 items for each alphabet: http://ideone.com/ZB1TH8 – falsetru Mar 18 '14 at 13:48
3

If you're going to do multiple searches, you should take the one-time hit of iterating through everything and build a dictionary (or, to make it simpler, collections.defaultdict):

from collections import defaultdict

d = defaultdict(list)

words = ['ABC', 'ACE', 'BED', 'BRT', 'CCD', ...]

for word in words:
    d[word[0]].append(word)

(Note that you shouldn't name your own variable list, as it shadows the built-in.)

Now you can easily query for the second word starting with "A":

d["A"][1] == "ACE"

or the first two words for each letter:

first_two = {c: w[:2] for c, w in d.items()}
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
1

Simply group all the elements by their first char

from itertools import groupby
from operator import itemgetter

example = ['ABC', 'ACE', 'BED', 'BRT', 'CCD']


d = {g:list(values) for g, values in groupby(example, itemgetter(0))}

Now to get a value starting with a:

print d.get('A', [])

This is most usefull when you have a static list and will have multiple queries since as you may see, getting the 3rd item starting with 'A' is done in O(1)

Samy Arous
  • 6,794
  • 13
  • 20
0

You might want to use list comprehensions

mylist = ['ABC', 'ACE', 'BED', 'BRT', 'CCD']
elements_starting_with_A = [i for i in mylist if i[0] == 'A']
>>> ['ABC', 'ACE']
second = elements_starting_with_A[1]
>>> 'ACE'
Germano
  • 2,452
  • 18
  • 25
0

In addition to list comprehension as others have mentioned, lists also have a sort() method.

mylist = ['AA', 'BB', 'AB', 'CA', 'AC']
newlist = [i for i in mylist if i[0] == 'A']
newlist.sort()
newlist
>>> ['AA', 'AB', 'AC']
0

The simple solution is to iterate over the whole list in O(n) :

(name for name in names if name.startswith('A'))

However you could sort the names and search in O(log(n)) for the item which is supposed to be on the index or after (using lexicographic comparison). The module bisect will help you to find the bounds :

from bisect import bisect_left

names = ['ABC', 'ACE', 'BED', 'BRT', 'CCD']

names.sort() 

lower = bisect_left(names, 'B')
upper = bisect_left(names, chr(1+ord('B')))

print [names[i] for i in range(lower, upper)] 
# ['BED', 'BRT']
Kiwi
  • 2,698
  • 16
  • 15