1

How to combine multiple .h5 files(but same shape) using python?

I have 10,000 .h5 files for 3D point cloud.

They have same shape.

And I would like to combine(or merge) 2000 files, so I can have total 5 big .h5 files. (such as append() function in python)

I found copy() functions from h5py(http://docs.h5py.org/en/latest/high/group.html).

However, I have not been able to apply that method to my problem.

Please refer to me example codes or help me for solving my problem.

Sorry for my poor English skills.

hjsg1010
  • 165
  • 3
  • 13
  • 1
    With `h5py`, the contents of a file are organized in Groups, and Datasets. Groups behave like Python dictionaries, Datasets like `numpy` arrays. You/we need to know how the source files are organized, and how you want the target file to be organized. It should be straight forward to load datasets from the sources, and save them to target. It's possible to concatenate many datasets/arrays into one much larger one. But first, what's the organization? – hpaulj Oct 24 '18 at 01:49
  • @hpaulj This is my dataset to be combined(imgur.com/a/4gahSD4). And as you said, I want to concatenate(append) these data in to one big .h5 files. the answer below can not fix my problem. could you help me? – hjsg1010 Oct 25 '18 at 06:48
  • 1
    Big files or big arrays? A HDF5 file can have many datasets (arrays) – hpaulj Oct 25 '18 at 07:10
  • @hpaulj Big arrays I guess. for example, If I have A.h5 which has group name 'label' and 'label' is 20*1 size array. and B.h5 and C.h5 has same as A.h5. Then, I want to get D.h5 which combines A,B,C, so D.h5 has group name 'label' and 'label' get 60*1 array – hjsg1010 Oct 25 '18 at 08:18

1 Answers1

2

You can simply do something like this (untested but should work):

import h5py

def copy(dest, name):
    g = dest.require_group(name)  # create output group with the name of input file
    def callback(name, node):
        if isinstance(node, h5py.Dataset):  # only copy dataset
            g.create(name, data=node[:])

with h5py.File('out.h5', 'w') as h5_out:
    for f_in in files:
        with h5py.File(f_in, 'r') as h5_in:
                h5_in.visititems(copy(h5_out, f_in))

This would create a "folder" (HDF5 group) for each of the files and copy all contents there, recursively.

See also: related question.

aldanor
  • 3,371
  • 2
  • 26
  • 26