2

Challenge here is in evaluating multiple large files.

What coding will instruct Python to "load" a limited number of files into memory, process them, garbage collect and then load the next set?

def main(directory):
    """
    Create AudioAnalysis Objects from directory and call object_analysis().
"""   
    ff = os.listdir(directory)
    for f in ff:
        # can we limit the number we load at one time?
        audiofile = audio.LocalAudioFile(os.path.join(directory,f)) # hungry!

Tried adding audiofile = 0 to the loop, but the memory allocation is the same.

As I understand it, Lazy Evaluation "is an evaluation strategy which delays the evaluation of an expression until its value is needed", but in this case I need to delay evaluation until there's memory available.

Am expecting that a decorator, descriptor and/or use of Pythons property() function may be involved, or possibly buffering or queueing the input.

MikeiLL
  • 6,282
  • 5
  • 37
  • 68
  • Does `LocalAudioFile` load the file into memory? It looks like your code only keeps one file--the current one--in memory at a time, loading a new file each time through the loop. – chepner Aug 18 '14 at 19:52
  • Have you tried triggering `gc.collect` in each iteration of the loop? – tobias_k Aug 18 '14 at 20:03
  • Also what machine are you using, i.e. 32 or 64 bit? What OS as well? Could be that others can manage it they way it is. I still hail being curious but just to help you with the problem of running out of memory, which I assume you have. – Aleksander Lidtke Aug 18 '14 at 20:06
  • Well my dev OS is osx 64 bit (`sysctl hw.cpu64bit_capable returns 1`), but I'm hoping the solution will be portable - planning to host `*NIX`, maybe with Linode. And am way past memory limits on a 1GB plan. – MikeiLL Aug 18 '14 at 21:35
  • @tobias_k tried gc.collect yesterday (and again just now) and it doesn't seem to effect the memory grab happening a the loop. – MikeiLL Aug 18 '14 at 21:41
  • @chepner `LocalAudioFile` certainly seems to load file into memory as well as creating an object associated with it. The memory usage their increases concurrently with number of files, so it must be loading them all together. – MikeiLL Aug 18 '14 at 21:44

1 Answers1

1

Here's one solution: have Python spawn a process, run the function on one file, then exit. The parent proc will collect results from each of the files.

This is in no way graceful, but if LocalAudioFile refuses to be dislodged from memory, it allows some flexibility in getting results.

This code runs runs a function on each Python file in the current directory, returning a message to the parent process, which prints it out.

source

import glob, multiprocessing, os

def proc(path):
    """
    Create AudioAnalysis Objects from directory and call object_analysis().
"""   
    # audiofile = audio.LocalAudioFile(path) # hungry!
    return 'woot: {}'.format(path)

if __name__=='__main__':  # required for Windows
    pool = multiprocessing.Pool()   # one Process per CPU
    for output in pool.map(proc, [
            os.path.abspath(name) for name in glob.glob('q*.py')
            ]):
        print 'output:',output

output

output: woot: /home/johnm/src/johntellsall/karma/qpopen.py
output: woot: /home/johnm/src/johntellsall/karma/quotes.py
johntellsall
  • 14,394
  • 4
  • 46
  • 40
  • Going now to look up the multiprocessing module, but should I be able to substitute `glob.glob('*.mp3')` for `glob.glob('q*.py')` here or am i misunderstanding the solution? – MikeiLL Aug 18 '14 at 21:48
  • @MikeiLL yes, `glob.glob('*.mp3')` is fine. Have fun! – johntellsall Aug 18 '14 at 21:52
  • This is really interesting. `KeyboardInterrupt` isn't `stopping` the process. I guess because it just stops a single call to process. Using osX Activity Monitor to achieve that. – MikeiLL Aug 18 '14 at 22:25
  • 1
    signals and multiprocessing really don't like each other – johntellsall Aug 18 '14 at 22:26
  • And tips on how I might go about profiling? Getting error when trying to add the `@profile` decorator I was using earlier" `cPickle.PicklingError: Can't pickle : attribute lookup __builtin__.function failed` – MikeiLL Aug 19 '14 at 01:29
  • 1
    Get a serial version to work, that handles a single audio (small) audio file. Then extend it to handle multiple files, or a single large file. With progress things will become more clear. – johntellsall Aug 19 '14 at 01:43