Questions tagged [hdf5]

The Hierarchical Data Format (HDF5) is a binary file format designed to store large amount of numerical data.

HDF5 refers to:

  • A binary file format designed to store efficiently large amount of numerical data
  • Libraries of function to create and manipulate these files

Main features

  • Free
  • Completely portable
  • Very mature
  • No limit on the number and size of the datasets
  • Flexible in the kind and structure of the data and meta-data
  • Complete library in C and Fortran well documented
  • A lot of wrappers and tools are available (Python, Matlab, Java, …)

Some links to get started

2598 questions
11
votes
3 answers

Using compression with Pandas and HD5 / HDFStore

For a few aspects of a project, using "h5" storage would be ideal. However, the files are becoming massive and frankly we're running out of space. This statement... store.put(storekey, data, table=False, compression='gzip') does not produce any…
TravisVOX
  • 20,342
  • 13
  • 37
  • 41
11
votes
7 answers

Looking for something similar to offsetof() for non-POD types

I'm looking for a way to obtain offsets of data members of a C++ class which is of non-POD nature. Here's why: I'd like to store data in HDF5 format, which seems most suited for my kind of material (numerical simulation output), but it is perhaps a…
yungchin
  • 1,519
  • 2
  • 15
  • 17
11
votes
1 answer

HDFStore.append(string, DataFrame) fails when string column contents are longer than those already there

I have a Pandas DataFrame stored via an HDFStore that essentially stores summary rows about test runs I am doing. Several of the fields in each row contain descriptive strings of variable length. When I do a test run, I create a new DataFrame with a…
ultra909
  • 1,740
  • 1
  • 23
  • 25
11
votes
2 answers

Get the dimensions of a HDF5 dataset

I'm using some HDF5 Files in my C++ program and I have a question regarding the H5Dopen function. Is it possible to get the dimensions of a hdf5 dataset in a given file? hid_t file, dset; herr_t status; file = H5Fopen (filenameField, H5F_ACC_RDONLY,…
raspiede
  • 179
  • 1
  • 1
  • 11
10
votes
3 answers

Failed to build h5py on mac M1

I am trying to install AlphaFold in a python virtual env. While trying to install dependencies, I get this error: ERROR: Could not find a version that satisfies the requirement tensorflow==1.14 (from versions: none) ERROR: No matching…
sam
  • 101
  • 1
  • 4
10
votes
1 answer

Can we disable h5py file locking for python file-like object?

When opening an HDF5 file with h5py you can pass in a python file-like object. I have done so, where the file-like object is a custom implementation of my own network-based transport layer. This works great, I can slice large HDF5 files over a high…
David Parks
  • 30,789
  • 47
  • 185
  • 328
10
votes
4 answers

Most efficient way to use a large data set for PyTorch?

Perhaps this question has been asked before, but I'm having trouble finding relevant info for my situation. I'm using PyTorch to create a CNN for regression with image data. I don't have a formal, academic programming background, so many of my…
Doug MacArthur
  • 125
  • 1
  • 2
  • 9
10
votes
3 answers

Object dtype dtype('O') has no native HDF5 equivalent

Well, it seems like a couple of similar questions were asked here in stack overflow, but none of them seem like answered correctly or properly, nor they described the exact examples. I have a problem with saving array or list into hdf5 ... I have a…
Isaac Sim
  • 539
  • 1
  • 7
  • 23
10
votes
4 answers

Keras custom data generator for large hdf5 file which does not fit into memory

I'm trying to use the pretrained InceptionV3 model to classify the food-101 dataset, which containts food images for 101 categories, 1000 per category. I've preprocessed this dataset into a single hdf5 file (I assumed this is beneficial compared to…
logi0517
  • 813
  • 1
  • 13
  • 32
10
votes
2 answers

How to append new categories to HDF5 in pandas?

Answered: It appears that this datatype will not be suited for adding arbitrary strings into hdf5store. Background I work with a script which generates single rows of results and appends them to a file on disk in an iterative approach. To speed…
sudonym
  • 3,788
  • 4
  • 36
  • 61
10
votes
1 answer

How to point to C header files in GO?

New to GoLang so go easy on me. I installed this package for which are GO bindings for the HDF5s filesystem: go get github.com/sbinet/go-hdf5 and I get fatal error: hdf5.h: No such file or directory // #include "hdf5.h" the file hdf5.h (which…
Brom Quinn
  • 325
  • 1
  • 3
  • 8
10
votes
0 answers

Reading and writing reference types using hdf5.net

I'm using HDF5DotNet to write a generic data logging API, DataLog. The idea is to use reflection to automatically create a H5 compound data type which contains the fields in T. The user can then easily add data to the data log using a write(T[]…
Matt Williams
  • 1,596
  • 17
  • 24
10
votes
1 answer

Deleting a key/table in an HDF Store with Python

Is there a pyTables method similar to the following: with pd.get_store(my_store) as store: keys = store.keys() rem_key = min(sorted(keys)) store.remove(rem_key) I am essentially trying to access the HDF5 store's list of…
KidMcC
  • 486
  • 2
  • 7
  • 17
10
votes
1 answer

compressed files bigger in h5py

I'm using h5py to save numpy arrays in HDF5 format from python. Recently, I tried to apply compression and the size of the files I get is bigger... I went from things (every file has several datasets) like…
manu
  • 1,333
  • 2
  • 11
  • 24
10
votes
3 answers

Reading HDF5 files

Is there a way to read HDF5 files using the Scala version of Spark? It looks like it can be done in Python (via Pyspark), but I can't find anything for Scala.
John
  • 1,167
  • 1
  • 16
  • 33