Questions tagged [hdf5]

The Hierarchical Data Format (HDF5) is a binary file format designed to store large amount of numerical data.

HDF5 refers to:

  • A binary file format designed to store efficiently large amount of numerical data
  • Libraries of function to create and manipulate these files

Main features

  • Free
  • Completely portable
  • Very mature
  • No limit on the number and size of the datasets
  • Flexible in the kind and structure of the data and meta-data
  • Complete library in C and Fortran well documented
  • A lot of wrappers and tools are available (Python, Matlab, Java, …)

Some links to get started

2598 questions
20
votes
8 answers

Save Keras ModelCheckpoints in Google Cloud Bucket

I'm working on training a LSTM network on Google Cloud Machine Learning Engine using Keras with TensorFlow backend. I managed it to deploy my model and perform a successful training task after some adjustments to the gcloud and my python script. I…
Kevin Katzke
  • 3,581
  • 3
  • 38
  • 47
20
votes
5 answers

How to differentiate between HDF5 datasets and groups with h5py?

I use the Python package h5py (version 2.5.0) to access my hdf5 files. I want to traverse the content of a file and do something with every dataset. Using the visit method: import h5py def print_it(name): dset = f[name] print(dset) …
NoDataDumpNoContribution
  • 10,591
  • 9
  • 64
  • 104
20
votes
2 answers

Get list of HDF5 contents (Pandas HDFStore)

I have no problem selecting content from a table within an HDF5 Store: with pandas.HDFStore(data_store) as hdf: df_reader = hdf.select('my_table_id', chunksize=10000) How can I get a list of all the tables to select from using pandas?
bcollins
  • 3,379
  • 4
  • 19
  • 35
20
votes
1 answer

How to partially copy using python an Hdf5 file into a new one keeping the same structure?

I have a large hdf5 file that looks something like this: A/B/dataset1, dataset2 A/C/dataset1, dataset2 A/D/dataset1, dataset2 A/E/dataset1, dataset2 ... I want to create a new file with only that: A/B/dataset1, dataset2 A/C/dataset1, dataset2 What…
graham
  • 335
  • 1
  • 3
  • 10
20
votes
1 answer

3d surface from a rectangular array of heights

I am trying to plot some HDF data in matplotlib. After importing them using h5py, the data is stored in a form of array, like this: array([[151, 176, 178], [121, 137, 130], [120, 125, 126]) In this case, x and y values are just the…
Paweł Rumian
  • 3,676
  • 3
  • 21
  • 27
19
votes
3 answers

'/' in names in HDF5 files confusion

I am experiencing some really weird interactions between h5py, PyTables (via Pandas), and C++ generated HDF5 files. It seems that, h5check and h5py seem to cope with type names containing '/' but pandas/PyTables cannot. Clearly, there is a gap in my…
Sardathrion - against SE abuse
  • 17,269
  • 27
  • 101
  • 156
19
votes
3 answers

How convert this type of data to something more readable in the python?

I have quite big dataset. All information stored in the hdf5 format file. I found h5py library for python. All works properly except of the [] I have no idea how to convert it in something more readable. Can I do it at all ?…
Dmytro Chasovskyi
  • 3,209
  • 4
  • 40
  • 82
18
votes
2 answers

Google Protocol Buffers, HDF5, NumPy comparison (transferring data)

I need help to make decision. I have a need to transfer some data in my application and have to make a choice between these 3 technologies. I've read about all technologies a little bit (tutorials, documentation) but still can't decide... How do…
illegal-immigrant
  • 8,089
  • 9
  • 51
  • 84
18
votes
6 answers

How to read a v7.3 mat file via h5py?

I have a struct array created by matlab and stored in v7.3 format mat file: struArray = struct('name', {'one', 'two', 'three'}, 'id', {1,2,3}, 'data', {[1:10], [3:9], [0]}) save('test.mat', 'struArray',…
Eastsun
  • 18,526
  • 6
  • 57
  • 81
18
votes
4 answers

Searching a HDF5 dataset

I'm currently exploring HDF5. I've read the interesting comments from the thread "Evaluating HDF5" and I understand that HDF5 is a solution of choice for storing the data, but how do you query it ? For example, say I've a big file containing some…
Pierre
  • 34,472
  • 31
  • 113
  • 192
17
votes
1 answer

How to get faster code than numpy.dot for matrix multiplication?

Here Matrix multiplication using hdf5 I use hdf5 (pytables) for big matrix multiplication, but I was suprised because using hdf5 it works even faster then using plain numpy.dot and store matrices in RAM, what is the reason of this behavior? And…
mrgloom
  • 20,061
  • 36
  • 171
  • 301
17
votes
3 answers

Can h5py load a file from a byte array in memory?

My python code is receiving a byte array which represents the bytes of the hdf5 file. I'd like to read this byte array to an in-memory h5py file object without first writing the byte array to disk. This page says that I can open a memory mapped…
mahonya
  • 9,247
  • 7
  • 39
  • 68
17
votes
2 answers

Removing data from a HDF5 file

I'm having a HDF5 file with one-dimensional (N x 1) dataset of compound elements - actually it's a time series. The data is first collected offline into the HFD5 file, and then analyzed. During analysis most of the data turns out to be…
Joonas Pulakka
  • 36,252
  • 29
  • 106
  • 169
17
votes
3 answers

Deleting information from an HDF5 file

I realize that a SO user has formerly asked this question but it was asked in 2009 and I was hoping that more knowledge of HDF5 was available or newer versions had fixed this particular issue. To restate the question here concerning my own…
Ason
  • 509
  • 2
  • 9
  • 25
16
votes
0 answers

Writing to two different files with HDF5

I've a small library in C that makes use of HDF5 to write data (v. 1.8.14) under Windows. That lib is then used by a C# app that does some other stuff and then needs to write quite a lot of data. I now need to launch two instances of the application…
Mauro Ganswer
  • 1,379
  • 1
  • 19
  • 33