Questions tagged [hdf5]

The Hierarchical Data Format (HDF5) is a binary file format designed to store large amount of numerical data.

HDF5 refers to:

  • A binary file format designed to store efficiently large amount of numerical data
  • Libraries of function to create and manipulate these files

Main features

  • Free
  • Completely portable
  • Very mature
  • No limit on the number and size of the datasets
  • Flexible in the kind and structure of the data and meta-data
  • Complete library in C and Fortran well documented
  • A lot of wrappers and tools are available (Python, Matlab, Java, …)

Some links to get started

2598 questions
14
votes
3 answers

Test group existence in hdf5/c++

I am opening an existing HDF5 file for appending data; I want to assure that group called /A exists for subsequent access. I am looking for an easy way to either create /A conditionally (create and return new group if not existing, or return the…
eudoxos
  • 18,545
  • 10
  • 61
  • 110
14
votes
2 answers

Incremental PCA on big data

I just tried using the IncrementalPCA from sklearn.decomposition, but it threw a MemoryError just like the PCA and RandomizedPCA before. My problem is, that the matrix I am trying to load is too big to fit into RAM. Right now it is stored in an hdf5…
KrawallKurt
  • 449
  • 1
  • 5
  • 15
14
votes
2 answers

h5py setup.py on Mac: hdf5.h file not found

I am building h5py on Mac, following instructions "Building against Parallel HDF5" in this link: http://docs.h5py.org/en/latest/build.html $ export CC=mpicc $ python setup.py configure --mpi $ sudo python setup.py build I get this…
yanggao
  • 231
  • 4
  • 7
14
votes
2 answers

How to resize an HDF5 array with `h5py`

How can I resize an HDF5 array using the h5py Python library ? I've tried using the .resize method and on an array with chunks set to True. Alas, I'm still missing something. In [1]: import h5py In [2]: f = h5py.File('foo.hdf5', 'w') In [3]: d =…
MRocklin
  • 55,641
  • 23
  • 163
  • 235
14
votes
3 answers

Unable to save DataFrame to HDF5 ("object header message is too large")

I have a DataFrame in Pandas: In [7]: my_df Out[7]: Int64Index: 34 entries, 0 to 0 Columns: 2661 entries, airplane to zoo dtypes: float64(2659), object(2) When I try to save this to disk: store =…
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
14
votes
1 answer

Updating h5py Datasets

Does any one have an idea for updating hdf5 datasets from h5py? Assuming we create a dataset like: import h5py import numpy f = h5py.File('myfile.hdf5') dset = f.create_dataset('mydataset',…
George Monet
  • 317
  • 2
  • 3
  • 7
14
votes
5 answers

How can I efficiently save a python pandas dataframe in hdf5 and open it as a dataframe in R?

I think the title covers the issue, but to elucidate: The pandas python package has a DataFrame data type for holding table data in python. It also has a convenient interface to the hdf5 file format, so pandas DataFrames (and other data) can be…
Griffith Rees
  • 1,285
  • 2
  • 15
  • 24
13
votes
1 answer

exporting from/importing to numpy, scipy in SQLite and HDF5 formats

There seems to be many choices for Python to interface with SQLite (sqlite3, atpy) and HDF5 (h5py, pyTables) -- I wonder if anyone has experience using these together with numpy arrays or data tables (structured/record arrays), and which of these…
hatmatrix
  • 42,883
  • 45
  • 137
  • 231
13
votes
5 answers

HDF5 C++ interface: writing dynamic 2D arrays

I am using the HDF5 C++ API to write 2D array dataset files. The HDF Group has an example to create a HDF5 file from a statically defined array size, which I've modified to suite my needs below. However, I require a dynamic array, where both NX and…
Mike T
  • 41,085
  • 18
  • 152
  • 203
13
votes
9 answers

Warning! ***HDF5 library version mismatched error*** python pandas windows

I'm using pandas/python to save a DataFrame in a HDFStore format. When I apply the my_data_frame.to_hdf(arguments...) command I have an error message:Warning! ***HDF5 library version mismatched error *** and my program is stopped. I'm working on…
Oscar Mike
  • 724
  • 3
  • 8
  • 22
13
votes
2 answers

Saving with h5py arrays of different sizes

I am trying to store about 3000 numpy arrays using HDF5 data format. Arrays vary in length from 5306 to 121999 np.float64 I am getting Object dtype dtype('O') has no native HDF5 equivalent error since due to the irregular nature of the data numpy…
13
votes
1 answer

What is the recommended compression for HDF5 for fast read/write performance (in Python/pandas)?

I have read several times that turning on compression in HDF5 can lead to better read/write performance. I wonder what ideal settings can be to achieve good read/write performance at: data_df.to_hdf(..., format='fixed', complib=..., complevel=...,…
Mark Horvath
  • 1,136
  • 1
  • 9
  • 24
13
votes
1 answer

Query HDF5 in Pandas

I have following data (18,619,211 rows) stored as a pandas dataframe object in hdf5 file: date id2 w id 100010 1980-03-31 10401 0.000839 100010 1980-03-31 10604 0.020140 100010 1980-03-31 …
user3576212
  • 3,255
  • 9
  • 25
  • 33
12
votes
5 answers

Could not find HDF5 installation for PyTables on M1 Mac

Running on M1 Mac, macOS Monterey 12.4, Python 3.10.3 pip install tables Collecting tables Using cached tables-3.7.0.tar.gz (8.2 MB) Installing build dependencies ... done Getting requirements to build wheel ... error error:…
Bn.F76
  • 783
  • 2
  • 12
  • 30
12
votes
2 answers

Append data to HDF5 file with Pandas, Python

I have large pandas DataFrames with financial data. I have no problem appending and concatenating additional columns and DataFrames to my .h5 file. The financial data is being updated every minute, I need to append a row of data to all of my…
Karl
  • 146
  • 1
  • 1
  • 12