Questions tagged [hdf5]

The Hierarchical Data Format (HDF5) is a binary file format designed to store large amount of numerical data.

HDF5 refers to:

  • A binary file format designed to store efficiently large amount of numerical data
  • Libraries of function to create and manipulate these files

Main features

  • Free
  • Completely portable
  • Very mature
  • No limit on the number and size of the datasets
  • Flexible in the kind and structure of the data and meta-data
  • Complete library in C and Fortran well documented
  • A lot of wrappers and tools are available (Python, Matlab, Java, …)

Some links to get started

2598 questions
35
votes
6 answers

Read HDF5 file into numpy array

I have the following code to read a hdf5 file as a numpy array: hf = h5py.File('path/to/file', 'r') n1 = hf.get('dataset_name') n2 = np.array(n1) and when I print n2 I get this: Out[15]: array([[, , …
e9e9s
  • 885
  • 2
  • 13
  • 24
34
votes
3 answers

How to install h5py (needed for Keras) on MacOS with M1?

I have an M1 MacBook. I have installed python 3.9.1 using pyenv, and have pip3 version 21.0.1. I have installed homebrew and hdf5 1.12.0_1 via brew install hdf5. When I type pip3 install h5py I get the error: Requirement already satisfied:…
Racing Tadpole
  • 4,270
  • 6
  • 37
  • 56
34
votes
2 answers

Incremental writes to hdf5 with h5py

I have got a question about how best to write to hdf5 files with python / h5py. I have data like: ----------------------------------------- | timepoint | voltage1 | voltage2 | ... ----------------------------------------- | 178 | 10 | 12…
user116293
  • 5,534
  • 4
  • 25
  • 17
34
votes
2 answers

Improve pandas (PyTables?) HDF5 table write performance

I've been using pandas for research now for about two months to great effect. With large numbers of medium-sized trace event datasets, pandas + PyTables (the HDF5 interface) does a tremendous job of allowing me to process heterogenous data using all…
Peter Gaultney
  • 3,269
  • 4
  • 16
  • 20
32
votes
3 answers

HDF5 in Java: What are the difference between the availabe APIs?

I've just discovered the HDF5 format and I'm considering using it to store 3D data spread over a cluster of Java application servers. I have found out that there are several implementations available for Java, and would like to know the differences…
Sebastien Diot
  • 7,183
  • 6
  • 43
  • 85
31
votes
3 answers

Convert large csv to hdf5

I have a 100M line csv file (actually many separate csv files) totaling 84GB. I need to convert it to a HDF5 file with a single float dataset. I used h5py in testing without any problems, but now I can't do the final dataset without running out of…
jmilloy
  • 7,875
  • 11
  • 53
  • 86
31
votes
6 answers

Combining hdf5 files

I have a number of hdf5 files, each of which have a single dataset. The datasets are too large to hold in RAM. I would like to combine these files into a single file containing all datasets separately (i.e. not to concatenate the datasets into a…
Bitwise
  • 7,577
  • 6
  • 33
  • 50
30
votes
4 answers

install HDF5 and pytables in ubuntu

I am trying to install tables package in Ubuntu 14.04 but sems like it is complaining. I am trying to install it using PyCharm and its package installer, however seems like it is complaining about HDF5 package. However, seems like I cannnot find any…
codeKiller
  • 5,493
  • 17
  • 60
  • 115
30
votes
3 answers

Storing a list of strings to a HDF5 Dataset from Python

I am trying to store a variable length list of string to a HDF5 Dataset. The code for this is import h5py h5File=h5py.File('xxx.h5','w') strList=['asas','asas','asas'] h5File.create_dataset('xxx',(len(strList),1),'S10',strList) h5File.flush()…
gman
  • 1,242
  • 2
  • 16
  • 29
29
votes
4 answers

Deleting hdf5 dataset using h5py

Is there any way to remove a dataset from an hdf5 file, preferably using h5py? Or alternatively, is it possible to overwrite a dataset while keeping the other datasets intact? To my understanding, h5py can read/write hdf5 files in 5 modes f =…
hsnee
  • 543
  • 2
  • 6
  • 17
28
votes
1 answer

Pandas can't read hdf5 file created with h5py

I get pandas error when I try to read HDF5 format files that I have created with h5py. I wonder if I am just doing something wrong? import h5py import numpy as np import pandas as pd h5_file = h5py.File('test.h5',…
Masha L.
  • 301
  • 1
  • 3
  • 5
27
votes
4 answers

What are the disadvantages of using .Rdata files compared to HDF5 or netCDF?

I have been asked to change a software that currently exports .Rdata files so that it exports in a 'platform independent binary format' such as HDF5 or netCDF. Two reasons were given: Rdata files can only be read by R binary information is stored…
David LeBauer
  • 31,011
  • 31
  • 115
  • 189
26
votes
1 answer

MATLAB: Differences between .mat versions

The official documentation states the following: . But I have noticed that there are other important differences besides those stated in the table above. For example, saving a cell array with about 6,000 elements that occupies 176 MB of memory in…
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
26
votes
4 answers

Python, PyTables, Java - tying all together

Question in nutshell What is the best way to get Python and Java to play nice with each other? More detailed explanation I have a somewhat complicated situation. I'll try my best to explain both in pictures and words. Here's the current system…
I82Much
  • 26,901
  • 13
  • 88
  • 119
26
votes
2 answers

how to export HDF5 file to NumPy using H5PY?

I have an existing hdf5 file with three arrays, i want to extract one of the arrays using h5py.
l.z.lz
  • 393
  • 1
  • 4
  • 13