Questions tagged [hdf5]

The Hierarchical Data Format (HDF5) is a binary file format designed to store large amount of numerical data.

HDF5 refers to:

  • A binary file format designed to store efficiently large amount of numerical data
  • Libraries of function to create and manipulate these files

Main features

  • Free
  • Completely portable
  • Very mature
  • No limit on the number and size of the datasets
  • Flexible in the kind and structure of the data and meta-data
  • Complete library in C and Fortran well documented
  • A lot of wrappers and tools are available (Python, Matlab, Java, …)

Some links to get started

2598 questions
12
votes
2 answers

Converting HDF5 to Parquet without loading into memory

I have a large dataset (~600 GB) stored as HDF5 format. As this is too large to fit in memory, I would like to convert this to Parquet format and use pySpark to perform some basic data preprocessing (normalization, finding correlation matrices,…
Eweler
  • 407
  • 5
  • 14
12
votes
2 answers

Storing scipy sparse matrix as HDF5

I want to compress and store a humongous Scipy matrix in HDF5 format. How do I do this? I've tried the below code: a = csr_matrix((dat, (row, col)), shape=(947969, 36039)) f = h5py.File('foo.h5','w') dset = f.create_dataset("init", data=a, dtype…
Rama
  • 1,019
  • 1
  • 15
  • 34
12
votes
1 answer

h5py cannot convert element 0 to hsize_t

I have a boatload of images in a hdf5-file that I would like to load and analyse. Each image is 1920x1920 uint16 and loading all off them into the memory crashes the computer. I have been told that others work around that by slicing the image, e.g.…
DonMP
  • 317
  • 2
  • 9
12
votes
3 answers

For python, install hdf5/netcdf4

Doing this on a Linux Mint 17.1. When I try: pip install hdf5 I get the error "Could not find a version that satisfies the requirement hdf5 (from versions: ) No matching distribution found for hdf5" I'm trying in the long run to install netcdf4…
confused
  • 1,283
  • 6
  • 21
  • 37
12
votes
2 answers

Store datetimes in HDF5 with H5Py

How can I store NumPy datetime objects in HDF5 using h5py? In [1]: import h5py In [2]: import numpy as np In [3]: f = h5py.File('foo.hdfs', 'w') In [4]: d = f.create_dataset('data', shape=(2, 2), dtype=np.datetime64) TypeError: No conversion path…
MRocklin
  • 55,641
  • 23
  • 163
  • 235
12
votes
2 answers

Reading a large table with millions of rows from Oracle and writing to HDF5

I am working with an Oracle database with millions of rows and 100+ columns. I am attempting to store this data in an HDF5 file using pytables with certain columns indexed. I will be reading subsets of these data in a pandas DataFrame and performing…
smartexpert
  • 2,625
  • 3
  • 24
  • 41
11
votes
3 answers

HDF5 Warnings When Accessing Xarray DataSet

I'd like to understand what is causing the warning messages that I'm getting in the following scenario: In an earlier operation I've created some NetCDF files and saved them to disk using xarray.to_netcdf(). Lazy evaluation of these datasets is…
jpolly
  • 141
  • 9
11
votes
1 answer

Loading hdf5 files into python xarrays

The python module xarray greatly supports loading/mapping netCDF files, even lazily with dask. The data source I have to work with are thousands of hdf5 files, with lots of groups, datasets, attributes - all created with h5py. The Question is: How…
fmfreeze
  • 197
  • 1
  • 11
11
votes
3 answers

Concatenate a large number of HDF5 files

I have about 500 HDF5 files each of about 1.5 GB. Each of the files has the same exact structure, which is 7 compound (int,double,double) datasets and variable number of samples. Now I want to concatenate all this files by concatenating each of the…
Andrea Zonca
  • 8,378
  • 9
  • 42
  • 70
11
votes
8 answers

OSError: Unable to open file (unable to open file)

I am trying to load a pre_trained model named "tr_model.h5" for my assignment but I get the following error: Traceback (most recent call last): File "Trigger_Project.py", line 84, in model = load_model(filename) File "Trigger_Project.py",…
Neeraj
  • 111
  • 1
  • 1
  • 4
11
votes
3 answers

MATLAB: Saving several variables to "-v7.3" (HDF5) .mat-files seems to be faster when using the "-append" flag. How come?

NOTE: This question deals with an issue observed back in 2011 with an old MATLAB version (R2009a). As per the update below from July 2016, the issue/bug in MATLAB seems to no longer exist (tested with R2016a; scroll down to end of question to see…
Ole Thomsen Buus
  • 1,333
  • 1
  • 9
  • 24
11
votes
2 answers

read HDF5 file to pandas DataFrame with conditions

I have a huge HDF5 file, I want to load part of it in a pandas DataFrame to perform some operations, but I am interested in filtering some rows. I can explain better with an example: Original HDF5 file would look something like: A B C D 1 …
codeKiller
  • 5,493
  • 17
  • 60
  • 115
11
votes
1 answer

How to store an array in hdf5 file which is too big to load in memory?

Is there any way to store an array in an hdf5 file, which is too big to load in memory? if I do something like this f = h5py.File('test.hdf5','w') f['mydata'] = np.zeros(2**32) I get a memory error.
Sounak
  • 4,803
  • 7
  • 30
  • 48
11
votes
1 answer

Appending Column to Frame of HDF File in Pandas

I am working with a large dataset in CSV format. I am trying to process the data column-by-column, then append the data to a frame in an HDF file. All of this is done using Pandas. My motivation is that, while the entire dataset is much bigger than…
lstyls
  • 173
  • 2
  • 13
11
votes
5 answers

How to store wide tables in pytables / hdf5

I have data coming from a csv which has a few thousand columns and ten thousand (or so) rows. Within each column the data is of the same type, but different columns have data of different type*. Previously I have been pickling the data from numpy…
acrophobia
  • 924
  • 7
  • 22