6

I'm reading a bunch of netcdf files using the pupynere interface (linux). The following code results in an mmap error:

import numpy as np
import os, glob
from pupynere import NetCDFFile as nc
alts = []
vals = []
path='coll_mip'
filter='*.nc'
for infile in glob.glob(os.path.join(path, filter)):
        curData = nc(infile,'r')
        vals.append(curData.variables['O3.MIXING.RATIO'][:])
        alts.append(curData.variables['ALTITUDE'][:])
        curData.close()

Error:

$ python2.7 /mnt/grid/src/profile/contra.py
Traceback (most recent call last):
  File "/mnt/grid/src/profile/contra.py", line 15, in <module>
  File "/usr/lib/python2.7/site-packages/pupynere-1.0.13-py2.7.egg/pupynere.py", line 159, in __init__
  File "/usr/lib/python2.7/site-packages/pupynere-1.0.13-py2.7.egg/pupynere.py", line 386, in _read
  File "/usr/lib/python2.7/site-packages/pupynere-1.0.13-py2.7.egg/pupynere.py", line 446, in _read_var_array
mmap.error: [Errno 24] Too many open files

Interestingly, if I comment one of the append commands (either will do!) it works! What am I doing wrong? I'm closing the file, right? This is somehow related to the python list. I used a different, inefficient approach before (always copying each element) that worked.

PS: ulimit -n yields 1024, program fails at file number 498.

maybe related to, but solution doesn't work for me: NumPy and memmap: [Errno 24] Too many open files

Community
  • 1
  • 1
Sebastian
  • 1,408
  • 1
  • 20
  • 28
  • Python (like perl) has a "debug mode" you can use to "sort-of see what's going on inside libraries"? Try that. It MIGHT be of some assistance. Could you also debug-print the number-of-open-file-handles (somehow-;) within the loop... I'm guessing it's opening TWO filehandles per iteration, just based on the 498 (a bit less than half 1024, and Python would have some files open itself (maybe 25-odd?). – corlettk Apr 29 '11 at 09:46
  • thanks for the useful comment. `python2.7 -d` doesn't yield further information (I'm guessing debugging wasn't enabled during the compilation of python). It would be indeed interesting to track the number of open files. How am I doing that? – Sebastian Apr 29 '11 at 09:56
  • See Sehe's "answer"... he tells us how to trace the open filehandles on linux ;-) – corlettk Apr 29 '11 at 10:09

4 Answers4

7

My guess is that the mmap.mmap call in pupynere is holding the file descriptor open (or creating a new one). What if you do this:

vals.append(curData.variables['O3.MIXING.RATIO'][:].copy())
alts.append(curData.variables['ALTITUDE'][:].copy())
  • that's it! The optimization with a python list prevented the closure of the files. thanks a lot. – Sebastian May 02 '11 at 08:57
  • Nice one. That's a bit of a gotcha for handling many-memory-mapped-files in Python, isn't it?!?!? It really **should** be documented, IMHO. – corlettk May 03 '11 at 06:57
  • It's not so much memory mapping, as that slices return views by default. You can get similar behavior by repeatedly creating a large NxN array and slicing one row from it to store in a list, repeating K times. Even though you might think you have only KxN memory in use, it's actually KxNxN, since the original array can't be reclaimed while there are "views" on it. –  May 03 '11 at 08:06
  • Why is the `[:]` necessary along with the `copy` method? I thought `[:]` implied a copy? – Louis Thibault Sep 21 '16 at 12:28
3

@corlettk: yeah since it is linux, do strace -e trace=file will do

strace -e trace=file,desc,munmap python2.7 /mnt/grid/src/profile/contra.py

This will show exactly which file is opened when - and even the file decriptors.

You can also use

ulimit -a

To see what limitations are currently in effect

Edit

gdb --args python2.7 /mnt/grid/src/profile/contra.py
(gdb) break dup
(gdb) run

If that results in too many breakpoints prior to the ones related to the mapped files, you might want to run it without breakpoints for a while, break it manually (Ctrl+C) and set the breakpoint during 'normal' operation; that is, if you have enough time for that :)

Once it breaks, inspect the call stack with

(gdb) bt
sehe
  • 374,641
  • 47
  • 450
  • 633
  • I ran `strace`, but I'm unable to comprehend the output. there are tons of python `open()` commands (no close), then my *netcdf* files are opened, e.g. `open("coll_mip/MIP.nc", O_RDONLY|O_LARGEFILE) = 3` then there is the traceback. See [here](http://bashseb.dyndns.org/strace-log.txt) – Sebastian Apr 29 '11 at 10:14
  • 2
    Well, it is surprising that the files seem to never be closed, however the open calls return `3` (filedescriptor) each time. I'm not familiar with that happening, but I suspect there is something in the memory mapping code that holds on to all memory mapped files for some reason – sehe Apr 29 '11 at 10:23
  • Yeah... that MUST be it... For some unfathomable reason NetCDFFile still has the file cached (and a handle open) even after we've told it to close the sucker... some sort of "performance optimisation", to save time constantly reopening the file within a tight loop, and it'll actually close the file if it's not "reopened" within a given timeframe (I've used this trick myself, sigh)... is there a "sleep" in python? I'm a python noob, at best. If so try pausing for let's say 5 seconds, at let's say the 100'th iteration... and see what happens in the strace. – corlettk Apr 29 '11 at 10:32
  • 2
    @Sebastian, How about `-e trace=file,desc,munmap` output? That'll get file-descriptor based syscalls (`close()` and `mmap()` :) and when the mappings are torn down again. – sarnold Apr 29 '11 at 10:37
  • I'm trying this in a minute. Just for now: `strace` doesn't even show a single `close` cmd when running the `print "Hello World"` cmd. A lot of files are being opened, none are closed. – Sebastian Apr 29 '11 at 10:40
  • 1
    @Seb, that's because `-e trace=file` shows syscalls that take file _names_ as arguments; `close(2)` takes a file _descriptor_ as an argument. A little annoying :) which is one reason why I almost never use the `trace` feature; seeing _all_ syscalls is often more instructive, and `grep -v` can remove ones I don't want to see after the fact. But remote debugging, it seems fair enough to filter a little. :) – sarnold Apr 29 '11 at 10:42
  • thanks you all so far! [Here](http://bashseb.dyndns.org/strace-log-updated.txt) is the output of `strace -e trace=file,desc,munmap python2.7 contra.py` (unfiltered, sorry @sarnold) – Sebastian Apr 29 '11 at 10:47
  • Could you find out where the dup(3) calls originate from? (If all else fails you _could_ employ gdb for this, see answer inline) – sehe Apr 29 '11 at 10:51
  • gdb: (no debugging symbols found), thus I can't set the breakpoint :( pupynere.py (the library for netcdf I/O) doesn't include direct `dup` calls. – Sebastian Apr 29 '11 at 10:59
  • Wow! I didn't expect all those `dup(3)` calls. At the end of each block, it releases _most_, but misses 6 11 25 31 12 17 25 26 31 33 34 35 ... – sarnold Apr 29 '11 at 11:03
  • @Sebastian: It'd be great to hear of the solution to this problem, both for your (and your users) sake... and out of pure curiosity. It's a curlie one all-right... the kind that yields a sense of satifaction when your even a small part of solving it. GOOD LUCK!!! – corlettk Apr 30 '11 at 00:38
  • @corlettk @sarnold @sehe: Thanks to all of you for your dedicated help. I accepted @thouis answer, because it solves the problem easiest. Your comments/answers definitely helped in narrowing down the answer! – Sebastian May 02 '11 at 12:15
2

Hmmm... Maybe, just maybe, with curData might fix it? Just a WILD guess.


EDIT: Does curData have a Flush method, perchance? Have you tried calling that before Close?


EDIT 2: Python 2.5's with statement (lifted straight from Understanding Python's "with" statement)

with open("x.txt") as f:
    data = f.read()
    do something with data

... basically it ALLWAYS closes the resource (much like C#'s using construct).

corlettk
  • 13,288
  • 7
  • 38
  • 52
  • yes, there is a `flush()` method. I didn't know that would be of use while reading files (I thought only for writing). Calling `curData.flush()` before `curData.close()` doesn't clear the error. I'm not familiar with the usage of `with`. Can you specify? – Sebastian Apr 29 '11 at 10:06
  • 1
    @Seb... see my edit above... can't format code in SOF comments. – corlettk Apr 29 '11 at 10:17
1

How expensive is the nc() call? If it is 'cheap enough' to run twice on every file, does this work?

for infile in glob.glob(os.path.join(path, filter)):
        curData = nc(infile,'r')
        vals.append(curData.variables['O3.MIXING.RATIO'][:])
        curData.close()
        curData = nc(infile,'r')
        alts.append(curData.variables['ALTITUDE'][:])
        curData.close()
sarnold
  • 102,305
  • 22
  • 181
  • 238