1

I have a collection of dictionaries declared as class variables (and stored in FILE_TYPES) that define filetypes. I need to cycle through these dictionaries to pair filetype names (stored in the 'name' key) with filenames that are passed into the class. Which of the following ways to do this is more pythonic? Or, is there a better way altogether -- maybe some dict comprehension I'm missing?

First way:

filetype = [ftype for ftype in cls.FILE_TYPES if ftype['name'] in filename][0]

Second way:

for ftype in cls.FILE_TYPES:
    if ftype['name'] in filename:
        filetype = ftype

I am not stuck on this per se, but I'd like to know if either of these approaches is better (e.g., more or less pythonic), or if it makes no difference to anyone.

I generally try to avoid nested logic like you see in the second approach whenever possible if I'm coding in python (not sure why entirely, but at some point this habit was hammered into me by god-knows-who). However, accessing the invariably singular list element in the first method with [0] also seems obnoxious.

justadampaul
  • 979
  • 1
  • 7
  • 11
  • 2
    These two code snippets are not doing the same thing – pault Oct 29 '19 at 16:14
  • 3
    These don't do the same thing. But in any case, don't use a list comprehension if you don't need to make a list. I'm not really sure what you mean by "nested logic" but it looks like you are doing that in both of these, as far as I can tell. – juanpa.arrivillaga Oct 29 '19 at 16:14
  • the 2nd is equivalent to `filetype = [ftype for ftype in cls.FILE_TYPES if ftype['name'] in filename][-1]` if anything – Ruzihm Oct 29 '19 at 16:15
  • I'm aware they are not generally logically identical. But in the special case that there will only ever be a single match to the if condition, the result will always be the same, won't it? – justadampaul Oct 29 '19 at 16:16
  • 5
    Yes in that special case, but you can also just add a `break` statement to the second remove the ambiguity. Related: [Get the first item from an iterable that matches a condition](https://stackoverflow.com/questions/2361426/get-the-first-item-from-an-iterable-that-matches-a-condition) – pault Oct 29 '19 at 16:17
  • 1
    Again, don't use a list comprehension here (a dict comprehension makes even less sense). Just use a for-loop but add a break when you do find it – juanpa.arrivillaga Oct 29 '19 at 16:19
  • Sure, a `break` statement works great. I'll do that. Thanks! – justadampaul Oct 29 '19 at 16:20

3 Answers3

0

I would do with the comprehension-generator inside next() instead of the list comprehension, e.g.:

filetype = next(ftype for ftype in cls.FILE_TYPES if ftype['name'] in filename)

the fundamental difference is that the list comprehension will not take advantage of the fact that you can terminate the loop as soon as the condition is met (so it will be less computationally efficient) and it will create an unnecessary temporary list (so it will be less memory efficient too).

To test it:

file_types = [dict(name='foo'), dict(name='bar'), dict(name='baz'), dict(name='pam'), dict(name='egg'),]
filename = 'spam'

filetype1 = next(ftype for ftype in file_types if ftype['name'] in filename)
print(filetype)
# {'name': 'pam'}

filetype = [ftype for ftype in file_types if ftype['name'] in filename][0]
print(filetype)
# {'name': 'pam'}

for ftype in file_types:
    if ftype['name'] in filename:
        filetype = ftype
        break
print(filetype)
# {'name': 'pam'}

EDIT: As per @pault comment this answer is fundamentally the same approach proposed here.

norok2
  • 25,683
  • 4
  • 73
  • 99
  • This is the same as the accepted answer on [Get the first item from an iterable that matches a condition](https://stackoverflow.com/questions/2361426/get-the-first-item-from-an-iterable-that-matches-a-condition). Better to just VTC, IMO. – pault Oct 29 '19 at 16:24
  • 2
    Most definitely. I just did not see the comment. Of course, everything you can write about Python was already written by Alex Martelli 10 years ago. – norok2 Oct 29 '19 at 16:27
  • 1
    @pault VTC? Vote to close? – norok2 Oct 29 '19 at 16:32
  • This works just as well -- better if you need the exception, worse if you don't. I'll let the other post serve as the answer anyway, as the other solution will still live in the comments. – justadampaul Oct 29 '19 at 16:35
  • @justadampaul see edits for an explanation on why would `next()` be preferable. – norok2 Oct 29 '19 at 16:38
  • Sorry - I agree. I meant I would accept the "Answered elsewhere" request, and the comments on my original post above can live on to elaborate on the possibility of a for...break solution. Not sure what the protocol is on accepting "duplicate" answers, but I'm happy to accept this one as well as linking to the other question. – justadampaul Oct 29 '19 at 17:44
  • @justadampaul oh OK, I misunderstood the comment probably. No idea what the official policy on accepting answers would be. – norok2 Oct 29 '19 at 18:12
0

Here is a potentially useful different approach to the problem.

If the filetype is determined by the extension of the file, then the os.path.splitext function can be used to separate the extension from the rest of the filename. Then, the dictionary of filetypes can be a mapping of extensions to whatever filetype object you need.

import os

# as a function/method
def get_filetype(filename):
    ext = os.path.splitext(filename)[1]
    return FILETYPES[ext]

filetype = get_filetype(filename)


# or the one-liner
filetype = FILETYPES[os.path.splitext(filename)[1]]

The upshot is that it runs in constant time with regard to the number of possible extensions. (As long as they are available in the FILETYPES dict.)

This approach also works if you don't use the extensions to figure out the filetype so long as there is some part of the filename that can be extracted and mapped to the filetype. The only change needed would be to change the code that extracts the extension to code that extracts whatever part of the filename determines the filetype.

0

I would prefer using next, which would raise an exception if no match exists:

filename = "foofoo"
filename2 = "nope"
FILE_TYPES = [{'name':'foo'},{'name':'baz'},{'name':'bat'},{'name':'bar'}]

try:
    print (next(x for x in FILE_TYPES if x['name'] in filename))
    print (next(x for x in FILE_TYPES if x['name'] in filename2))

except StopIteration:
    print ("no such element")
{'name': 'foo'}
no such element
Ruzihm
  • 19,749
  • 5
  • 36
  • 48