I'm trying to parallelize my python code with MPI. I'm reading in my input from a txt file and writing the output in an HDF5 file. When I submit my job to the queue (just one node, 32ppn), I get the following error when I opening the output hdf5 file($ h5dump groups_25x25x25.hdf5):
"h5dump error: internal error (file h5dump.c:line 1615) HDF5: infinite loop closing library D,G,T,F,FD,P,FD,P,FD,P,E,E,SL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL"
When I run the program from my home directory (mpiexec-l -n 4 python find_bounded_v2.py), I get this error: h5dump error: internal error (file h5dump.c:line 1615)
I checked my run.log and the program is running fine, so it probably has to do with the input/output.
Here is the part of the code where I modify i/o and might be the source of the error:
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
[SOME OTHER STUFF]
numbounded= np.zeros (1)
totalbounded= np.zeros(1)
f= h5py.File ("groups_25x25x25.hdf5", "w")
bh = f.create_group ("bounded_neighbors")
for k in range(local_start, local_end):
target = ids[k]
bound_ids=Energy
(target_halo,np.array(mvir),np.array(x),np.array(y),np.array(z),np.array(vx),np.array(vy),np.array(vz),np.array (ids))
bh.create_dataset(str(target), data= bound_ids)
numbounded[0]+= len (bound_ids)
comm.Allreduce(numbounded,totalbounded, op = MPI.SUM)
[SOME OTHER STUFF]
f.close()
All my log file looks fine, but the output file just isn't working as I keep getting the above error. I'm new to MPI, so any suggestions for fixes would be appreciated.