1

I have a set of files in a directory. So I created a function that apply some processing to all the files in the directory:

def fancy_function(directory, regex):
    for set the path of the directory:
       with open the file names and walk over them:
           preprocessing1 = [....]
           preprocessing2 = [ remove some punctuation from preprocessing1]

           return preprocessing2

Then I do the following:

list_of_lists = fancy_function(a_directory, a_regex)
print list_of_lists
>>>['processed string']

It just return one list and the directory actually have 5 files, then when I do the following:

def fancy_function(directory, regex):
    do preprocessing...
    preprocessing1 = [....]
    preprocessing2 = [ remove some punctuation from preprocessing1]
    print preprocessing2

print fancy_function(a_directory, a_regex)

It returns the 5 preprocessed files that I want like this:

['file one']
['file two']
['file three']
['file four']
['file five']

Why is this happening and how can I obtain the 5 files in a list?. I would like to save them In one list in order to make a nother processing but now for each list in the main list, something like this:

main_list =[['file one'], ['file two'], ['file three'], ['file four'], ['file five']]
Aswin Murugesh
  • 10,831
  • 10
  • 40
  • 69
newWithPython
  • 853
  • 3
  • 9
  • 20
  • 4
    I'm guessing you have the `return` inside a loop. `return` causes the function to exit, aborting any loops. Please show your actual code. – BrenBarn Jan 02 '15 at 04:12
  • 1
    It's a lot of code.. I apply the return out of the for loop and now it just return 2 words of the final list. – newWithPython Jan 02 '15 at 04:21
  • 1
    You can try to use yield instead of return inside the loop. – Paulo Scardine Jan 02 '15 at 04:26
  • 1
    you don't need `yield`... you need to `append` to the list - currently you are recreating the list each iteration. then you should also put the `return` outside the loop – Anentropic Jan 02 '15 at 04:27
  • @PauloScardine when I used yield it return the following: . How can I return the 5 lists in a list?. – newWithPython Jan 02 '15 at 04:30
  • @Anentropic, true. It seems like I'm just recreating a new list. Could you provide me an example?. – newWithPython Jan 02 '15 at 04:31
  • Just iterate over the result of the function (it will work like a list) or envelope the result with the list constructor: `list(fancy_function(a_directory, a_regex))`. – Paulo Scardine Jan 02 '15 at 05:19
  • @Anentropic I think it is the way around... the OP doesn't really need a list if he is just iterating over it once - in this case yield has some advantages over returning a list. – Paulo Scardine Jan 02 '15 at 05:31
  • In fact I need the the return value as a list, since I will process that list of lists of strings. – newWithPython Jan 02 '15 at 06:47

1 Answers1

3

You have a return statement inside a for loop, which is a common gotcha. The function ends immediately, returning a single element, instead of returning a list of all the processed elements.

You have two options. First, you can explicitly define a list within your function, append intermediate results to that list, and return the list at the end.

def fancy_function(directory, regex):
    preprocessed_list = []
    for set the path of the directory:
        with open the file names and walk over them:
            preprocessing1 = [....]
            preprocessing2 = [ remove some punctuation from preprocessing1]

            preprocessed_list.append(preprocessing2)
    return preprocessed_list

Or fancier, you can turn your function into a generator.

def fancy_function(directory, regex):
    preprocessed_list = []
    for set the path of the directory:
        with open the file names and walk over them:
            preprocessing1 = [....]
            preprocessing2 = [ remove some punctuation from preprocessing1]

            yield preprocessing2 # notice yield, not return

This generator can then be used thus:

>>> preprocessed = fancy_function(a_directory, a_regex)
>>> print list(preprocessed)
[['file one'], ['file two'], ['file three'], ['file four'], ['file five']]
Community
  • 1
  • 1
zehnpaard
  • 6,003
  • 2
  • 25
  • 40
  • Cool, no problem! Once you're comfortable with Python's basics, I really recommend reading up on generators, because my explanation above doesn't do it justice - them and the list comprehension family really add a bit of magic to Python's power and expressiveness. – zehnpaard Jan 02 '15 at 04:40
  • Could you provide me some reference?.Thanks – newWithPython Jan 02 '15 at 04:42
  • 1
    I added a link to a SO question about what generators are good for. The Python Wiki has a very quick intro on [why and how you create a generator](https://wiki.python.org/moin/Generators), I feel [Jeff Knupp's blog post gives a more friendly discussion](http://www.jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/) but you should stop when he starts talking about Moar Power. Once you're comfortable and want something truly mind-bending, read/watch David Beazley's presentations - [example](http://www.dabeaz.com/generators/Generators.pdf) – zehnpaard Jan 02 '15 at 04:50
  • 1
    btw I didn't mean to say Jeff Knupp's post was weaker after Moar Power, it's just that the discussion goes towards coroutines, which uses the same syntax and mechanism, but adds another magnitude of complexity and conceptually has a very different use to standard generators. Once you've mastered generators and want more mind-warping programming concepts, you should go read up on coroutines. You'll probably encounter use cases for generators all the time, coroutines not so much. – zehnpaard Jan 02 '15 at 05:03