Parallel h5py/hdf5 writing to large dataset skips data chunks

Question

I am using mpi and h5py/hdf5 (Hdf5 and h5py were compiled to have parallel capabilities and everything runs on python 3.4.) on a cluster to stitch a dataset of overlapping tiles (200 or more images/numpy arrays 2048x2048).
Each tile has an assigned index number that correspond to the position where it should be written and all the indexes are stored in an array:

Example tile_array:

array([[ 0,  1,  2,  3],  
[ 4,  5,  6,  7],  
[ 8,  9, 10, 11],  
[12, 13, 14, 15]], dtype=int32)

The issue I have are:

When Running the code on the cluster:
- When I use multiple cores to write the stitched image in parallel I get randomly missing tiles.
- When I run the stitching serially in rank==0 I still have missing tiles.
- If the dataset is small there are no issues.
- If do not activate atomic mode in the parallel writing the number of errors increases. The number of errors increases if the atomic mode is
activated when I run the writing on rank==0

Running the code on a laptop:
- When I run the same code on single core in a laptop, stripped of the mpi code, no tiles are missing.

Any comment will be greatly appreciated.

Highlight of the procedure:
0) Before stitching, each image of the set is processed and the overlapping edges are blended in order to get a better result in the overlapping regions
1) I open the hdf file where I want to save the data: stitching_file=h5py.File(fpath,'a',driver='mpio',comm=MPI.COMM_WORLD)
2) I create a big dataset that will contain the stitched images
3) I fill the dataset with zeros
4) The processed images will be written (added) to the big dataset by different cores. However, adjacent images have overlapping regions. In order to avoid that different cores will write tiles with some overlap and break the writing I will run four rounds of writing in order to cover the whole dataset with this kind of patttern:

Example tile_array:

array([[ 0,  1,  2,  3],
      [ 4,  5,  6,  7],
      [ 8,  9, 10, 11],
      [12, 13, 14, 15]], dtype=int32)  
Writing rounds:
writing round one: array([ 0,  2,  8, 10], dtype=int32)  
writing round two: array([ 1,  3,  9, 11], dtype=int32) 
writing round three: array([ 4,  6, 12, 14], dtype=int32)  
writing round four: array([ 5,  7, 13, 15], dtype=int32)

If I write on single rank I don't use this scheme and just write each position sequentially

Pseudocode

# Preprocessing of the images  
------------------------------  
------------------------------  


# MPI data chunking function
def tasks_chunking(tasks,size):
    # This function scatter any kind of list, not only ordered ones
    # If the number of cores is bigger than the length of the list
    # the size of the chunk will be zero and the 

    # Chunk the list of tasks
    Chunked_list=np.array_split(tasks,size)
    NumberChunks=np.zeros(len(Chunked_list),dtype='int32')
    for idx,chunk in enumerate(Chunked_list):
        NumberChunks[idx]=len(chunk)

    # Calculate the displacement
    Displacement=np.zeros(len(NumberChunks),dtype='int32')
    for idx,count in enumerate(NumberChunks[0:-1]):
        Displacement[idx+1]=Displacement[idx]+count

    return Chunked_list,NumberChunks,Displacement



# Flush the file after the preprocessing
stitching_file.flush()

# Writing dictionary. Each group will contain a subset of the tiles to be written.
# Data is the size of the final array where to save the data
fake_tile_set = np.arange(Data.size,dtype='int32')
fake_tile_set = fake_tile_set.reshape(Data.shape)

writing_dict = {}
writing_dict['even_group1']=fake_tile_array[::2, ::2].flat[:]
writing_dict['even_group2']=fake_tile_array[::2, 1::2].flat[:]
writing_dict['odd_group1'] =fake_tile_array[1::2, ::2].flat[:]
writing_dict['odd_group2'] =fake_tile_array[1::2, 1::2].flat[:]


 # Withouth activating the atomic mode the number of errors in parallel mode is higher.

 # Withouth activating the atomic mode the number of errors in parallel mode is higher.

stitching_file.atomic=True

# Loop through the dictionary items to start the writing
for key, item in writing_dict.items():

# Chunk the positions that need to be written
if rank==0:
    Chunked_list,NumberChunks,Displacement=tasks_chunking(item,size)
else:
    NumberChunks = None
    Displacement = None
    Chunked_list= None

# Make the cores aware of the number of jobs that will need to run
# The variable count is created by the scatter function and has the number of
# processes and is different in every core
cnt = np.zeros(1, dtype='int32')
comm.Scatter(NumberChunks, cnt, root=0)
# Define the local variable that will be filled up wuth the scattered data
xlocal = np.zeros(cnt, dtype='int32') # Use rank for determine the size of the xlocal on the different cores

# Scatter the value of tasks to the different cores
comm.Scatterv([item,NumberChunks,Displacement,MPI.INT],xlocal, root=0)

for tile_ind in xlocal:
    # This writing function is the same called when I run the writing on rank 0 or on a single core in a laptop.
    paste_in_final_image(joining, temp_file, stitched_group, tile_ind, nr_pixels)    

comm.Barrier()
stitching_file.flush()
stitching_file.close()

Are the chunks of the dataset aligned to your set chunks? i.e. is the chunk_size option given when creating the dataset? — Pierre de Buyl, Nov 09 '16 at 11:37
The NumberChunks depends on the number of positions to write in each round (item) and the number of cores available (comm.size)...I hope I understood your comment correctly. — s1mc0d3, Nov 09 '16 at 12:30
This is way too big of a problem for SO; unless there is someone already working on something very similar. We can't reproduce it. — hpaulj, Nov 09 '16 at 13:31
It's hard to help without a `Minimal, Complete, and Verifiable example` — hpaulj, Nov 09 '16 at 18:23
@Ratoskj , I am talking about the option `chunk` to `create_dataset` (see http://docs.h5py.org/en/latest/high/dataset.html). As you are working in parallel, you have possible race conditions on writing to the dataset chunks. I already used parallel h5py but aligned the writes to the chunks to avoid races. — Pierre de Buyl, Nov 09 '16 at 20:53
See https://support.hdfgroup.org/HDF5/doc/UG/HDF5_Users_Guide-Responsive%20HTML5/index.html#t=HDF5_Users_Guide%2FDatasets%2FHDF5_Datasets.htm%23IX_chunked for an introduction to chunking. — Pierre de Buyl, Nov 09 '16 at 20:53
@PierredeBuyl, thanks! Unfortunately I cannot chunk the final dataset because i have overlapping regions in the original images. I thought that writing chunks using the strategy I described above was enough to take care of race issues. — s1mc0d3, Nov 11 '16 at 13:21

Parallel h5py/hdf5 writing to large dataset skips data chunks

0 Answers0