7

I'm working with a dictionary for an anagram program in Python. The keys are tuples of sorted letters, and the values are arrays of the possible words with those letters:

wordlist = {
   ('d', 'g', 'o'): ['dog', 'god'],
   ('a', 'c', 't'): ['act', 'cat'],
   ('a', 's', 't'): ['sat', 'tas'],
}

I am using regex to filter the list down. So given r't$' as a filter the final result should be:

filtered_list = {
   ('a', 'c', 't'): ['act', 'cat'],
   ('a', 's', 't'): ['sat'],
}

So far I've gotten it down to two steps. First, keep all of the words that match the expression:

tmp = {k: [w for w in v if re.search(r't$', w)] for k, v in wordlist.items()}

This leaves me with empty lists:

{
   ('d', 'g', 'o'): [],
   ('a', 'c', 't'): ['act', 'cat'],
   ('a', 's', 't'): ['sat'],
}

Then I need a second pass to get rid of the empty lists:

filtered_list = {k: v for k, v in tmp.items() if v}

I'm sure there is a way to do these in one step, but I haven't figured it out yet. Is there a way to combine them? Or a better way to do this in general?

shx2
  • 61,779
  • 13
  • 130
  • 153
phraktyl
  • 73
  • 3
  • 2
    Welcome to SO. This is an excellent first question. – shx2 Apr 25 '14 at 17:58
  • Thanks! I tried to make sure I did all of my research before hand. I'm a twenty year Perl guy, and there some weird Python idioms that I'm still trying to wrap my head around. – phraktyl Apr 25 '14 at 20:58

2 Answers2

4

Doing this in two steps is fine, and probably good for readability.

But to answer your question, here's a one-liner (broken into multiple lines, for readability). It uses a generator expression for generating the pairs from the first step.

{
  k:v for k, v in
  (
    (kk, [w for w in vv if re.search(r't$', w)])
    for kk, vv in wordlist.items()
  )
  if v
}
=> {('a', 'c', 't'): ['act', 'cat'], ('a', 's', 't'): ['sat']}
shx2
  • 61,779
  • 13
  • 130
  • 153
  • 1
    This is what I was looking for. Excellent! Thank you! It also helps further my understanding of list comprehension and generators, which are very unfamiliar at this point. – phraktyl Apr 25 '14 at 17:49
  • Assuming this dictionary is large, on python2 would it be useful to use `iteritems` instead of `items`? – SethMMorton Apr 25 '14 at 18:10
  • @SethMMorton, absolutely. That's almost always true in python2. I use `items` because that's what OP used and because that specific point is irrelevant to the question asked. – shx2 Apr 25 '14 at 18:13
  • I'm using python3, so as I understand it items() is the same as iteritems() was in python2. – phraktyl Apr 25 '14 at 21:14
4

For a one-liner, something like this?

A = {k:[w for w in v if re.search(r't$', w)] for k,v in wordlist.items() if any(re.search(r't$', w) for w in v)}
ysakamoto
  • 2,512
  • 1
  • 16
  • 22
  • I would have originally suggested splitting into >1 line but this is actually very readable, nice. – joc Apr 25 '14 at 17:38
  • This looks nice, but it seems like it includes all of the words in the list if any of the words match, as opposed to just the words that match. It included 'sat' and 'tas', where only 'sat' matches the expression. – phraktyl Apr 25 '14 at 17:44
  • so you want the last one in your key to be `t`? – ysakamoto Apr 25 '14 at 17:45
  • I want every key with words that match the expression, but only the words in the list that match. So both of the words in the ('a', 'c', 't') key match ('act', 'cat') so they are both included. But only one of the words in the ('a', 's', 't') key match ('sat'), so only that one is included with that key. – phraktyl Apr 25 '14 at 18:02