Questions tagged [hdf5]

The Hierarchical Data Format (HDF5) is a binary file format designed to store large amount of numerical data.

HDF5 refers to:

  • A binary file format designed to store efficiently large amount of numerical data
  • Libraries of function to create and manipulate these files

Main features

  • Free
  • Completely portable
  • Very mature
  • No limit on the number and size of the datasets
  • Flexible in the kind and structure of the data and meta-data
  • Complete library in C and Fortran well documented
  • A lot of wrappers and tools are available (Python, Matlab, Java, …)

Some links to get started

2598 questions
25
votes
1 answer

Discovering keys using h5py in python3

In python2.7, I can analyze an hdf5 files keys use $ python >>> import h5py >>> f = h5py.File('example.h5', 'r') >>> f.keys() [u'some_key'] However, in python3.4, I get something different: $ python3 -q >>> import h5py >>> f =…
user14717
  • 4,757
  • 2
  • 44
  • 68
24
votes
4 answers

HDF5 Example code

Using HDF5DotNet, can anyone point me at example code, which will open an hdf5 file, extract the contents of a dataset, and print the contents to standard output? So far I have the following: H5.Open(); var h5 =…
Crosbie
  • 1,805
  • 2
  • 16
  • 22
24
votes
3 answers

Fastest way to write HDF5 files with Python?

Given a large (10s of GB) CSV file of mixed text/numbers, what is the fastest way to create an HDF5 file with the same content, while keeping the memory usage reasonable? I'd like to use the h5py module if possible. In the toy example below, I've…
Nicholas Palko
  • 813
  • 3
  • 11
  • 21
24
votes
2 answers

Iteratively writing to HDF5 Stores in Pandas

Pandas has the following examples for how to store Series, DataFrames and Panelsin HDF5 files: Prepare some data: In [1142]: store = HDFStore('store.h5') In [1143]: index = date_range('1/1/2000', periods=8) In [1144]: s = Series(randn(5),…
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
24
votes
2 answers

hdf5 / h5py ImportError: libhdf5.so.7

I'm working on a project involving network messaging queues (msgpack, zmq, ...) on a RHEL 6.3 (x86_64) system. I was installing the most recent packages of glib, gevent, pygobject, pygtk, and such in order to get pylab / matplotlib to work (which…
cronburg
  • 892
  • 1
  • 8
  • 24
23
votes
3 answers

TensorFlow - tf.data.Dataset reading large HDF5 files

I am setting up a TensorFlow pipeline for reading large HDF5 files as input for my deep learning models. Each HDF5 file contains 100 videos of variable size length stored as a collection of compressed JPG images (to make size on disk manageable).…
verified.human
  • 1,287
  • 3
  • 17
  • 26
23
votes
2 answers

hdf5 file to pandas dataframe

I downloaded a dataset which is stored in .h5 files. I need to keep only certain columns and to be able to manipulate the data in it. To do this, I tried to load it in a pandas dataframe. I've tried to use: pd.read_hdf(path) But I get: No dataset…
Graham Slick
  • 6,692
  • 9
  • 51
  • 87
23
votes
2 answers

HDF5 file created with h5py can't be opened by h5py

I created an HDF5 file apparently without any problems, under Ubuntu 12.04 (32bit version), using Anaconda as Python distribution and writing in ipython notebooks. The underlying data are all numpy arrays. For example, import numpy as np import…
Lilith-Elina
  • 1,613
  • 4
  • 20
  • 31
22
votes
2 answers

HDF5 viewers/editors linux

HDFVIEW is pretty good, but are there any alternatives? It would be great to be able to change things like chunking/compression settings - hdfview doesn't have that functionality - without having to resort loading the files in using…
tdc
  • 8,219
  • 11
  • 41
  • 63
22
votes
7 answers

numpy undefined symbol: PyFPE_jbuf

I am trying to use the One Million Song Dataset, for this i had to install python tables, numpy, cython, hdf5, numexpr, and so. Yesterday i managed to install all i needed, and after having some troubles with hdf5, i downloaded the precompiled…
frammnm
  • 537
  • 1
  • 5
  • 17
22
votes
2 answers

Does HDF5 support concurrent reads, or writes to different files?

I'm trying to understand the limits of HDF5 concurrency. There are two builds of HDF5: parallel HDF5 and default. The parallel version is is currently supplied in Ubuntu, and the default in Anaconda (judged by --enable-parallel flag). I know that…
Maxim Imakaev
  • 1,435
  • 2
  • 13
  • 26
22
votes
3 answers

Close an open h5py data file

In our lab we store our data in hdf5 files trough the python package h5py. At the beginning of an experiment we create an hdf5 file and store array after array of array of data in the file (among other things). When an experiment fails or is…
Adriaan Rol
  • 420
  • 2
  • 4
  • 12
21
votes
1 answer

Writing & Appending arrays of float to the only dataset in hdf5 file in C++

I am processing number of files, each processing of the file will output several thousand of arrays of float and I will store the data of all files in one huge dataset in a single hdf5 for further processing. The thing is currently I am confused…
Karl
  • 5,613
  • 13
  • 73
  • 107
21
votes
3 answers

Storing numpy sparse matrix in HDF5 (PyTables)

I am having trouble storing a numpy csr_matrix with PyTables. I'm getting this error: TypeError: objects of type ``csr_matrix`` are not supported in this context, sorry; supported objects are: NumPy array, record or scalar; homogeneous list or…
pnsilva
  • 655
  • 1
  • 9
  • 20
20
votes
1 answer

When to use the .ckpt vs .hdf5 vs. .pb file extensions in Tensorflow model saving?

Tensorflow explains that models can be saved in three file formats: .ckpt or .hdf5 or .pb. There's a lot of documentation so it would be nice to get a simpler comparison of when to use which file format. Here's my current understanding: ckpt From…
skeller88
  • 4,276
  • 1
  • 32
  • 34