Questions tagged [hdf5]

The Hierarchical Data Format (HDF5) is a binary file format designed to store large amount of numerical data.

HDF5 refers to:

  • A binary file format designed to store efficiently large amount of numerical data
  • Libraries of function to create and manipulate these files

Main features

  • Free
  • Completely portable
  • Very mature
  • No limit on the number and size of the datasets
  • Flexible in the kind and structure of the data and meta-data
  • Complete library in C and Fortran well documented
  • A lot of wrappers and tools are available (Python, Matlab, Java, …)

Some links to get started

2598 questions
10
votes
2 answers

Is there a way to get a numpy-style view to a slice of an array stored in a hdf5 file?

I have to work on large 3D cubes of data. I want to store them in HDF5 files (using h5py or maybe pytables). I often want to perform analysis on just a section of these cubes. This section is too large to hold in memory. I would like to have a numpy…
Caleb
  • 3,839
  • 7
  • 26
  • 35
10
votes
1 answer

Get column names (headers) from hdf file

I was wondering how to get the column names (seemingly stored in the hdf header) of an hdf file; for example, a file might have columns named [a,b,c,d] while another file has columns [a,b,c] and yet another has columns [b,e,r,z]; and I would like to…
Cenoc
  • 11,172
  • 21
  • 58
  • 92
10
votes
2 answers

shared library locations for matlab mex files:

I am trying to write a matlab mex function which uses libhdf5; My Linux install provides libhdf5-1.8 shared libraries and headers. However, my version of Matlab, r2007b, provides a libhdf5.so from the 1.6 release. (Matlab .mat files bootstrap hdf5,…
shabbychef
  • 1,940
  • 3
  • 16
  • 28
10
votes
2 answers

Converting large SAS dataset to hdf5

I have multiple large (>10GB) SAS datasets that I want to convert for use in pandas, preferably in HDF5. There are many different data types (dates, numerical, text) and some numerical fields also have different error codes for missing values (i.e.…
vgregoire
  • 139
  • 8
10
votes
3 answers

Python-created HDF5 dataset transposed in Matlab

I have some data that I share between Python and Matlab. I used to do it by saving NumPy arrays in MATLAB-style .mat files but would like to switch to HDF5 datasets. However, I've noticed a funny feature: when I save a NumPy array in an HDF5 file…
John Manak
  • 13,328
  • 29
  • 78
  • 119
10
votes
3 answers

PyTables dealing with data with size many times larger than size of memory

I'm trying to understand how PyTables manage data which size is greater than memory size. Here is comment in code of PyTables (link to GitHub): # Nodes referenced by a variable are kept in `_aliveNodes`. # When they are no longer referenced, they…
Gill Bates
  • 14,330
  • 23
  • 70
  • 138
10
votes
2 answers

HDF5 Storage Overhead

I'm writing a large number of small datasets to an HDF5 file, and the resulting filesize is about 10x what I would expect from a naive tabulation of the data I'm putting in. My data is organized hierarchically as follows: group 0 -> subgroup 0 …
apdnu
  • 6,717
  • 5
  • 24
  • 24
10
votes
2 answers

Better way to open HDF5 files in C++

I have been trying to come up with a way to get around some of the shortcomings of the HDF5 C++ bindings. Currently, my code is littered with try/catch blocks similar to the following: H5::Exception::dontPrint(); H5::H5File *file = NULL; try { …
apdnu
  • 6,717
  • 5
  • 24
  • 24
9
votes
1 answer

How to install hdf5 on Docker image with Linux alpine 3.13

I am building a Docker image with python 3.7.10 (Linux Alpine v3.13) but when building the image with docker build . the package hdf5 will fail during the installation. This is my Dockerfile: FROM python:3.7.10-alpine3.13 RUN mkdir /app WORKDIR…
Ander
  • 5,093
  • 7
  • 41
  • 70
9
votes
1 answer

MemoryError: Unable to allocate 30.4 GiB for an array with shape (725000, 277, 76) and data type float64

It gives that memory error but memory capacity is never reached. I have 60 GB of ram on the SSH and the full dataset process consumes 30 I am trying to train an autoendcoder with k-fold. Without k-fold the training works fine. The raw dataset…
iftekm
  • 93
  • 1
  • 1
  • 3
9
votes
9 answers

HDF5 library version error - HDF5 ver 1.10.4

I'm trying to import some packages with spyder (OS x64), Anaconda and pyton 3.x The error is pretty famous in the internet. The solution proposed is to match the version of the library 1.10.5 with the HDF5 (mine is 1.10.4) The question is that I…
Mirko Piccolo
  • 319
  • 1
  • 2
  • 12
9
votes
6 answers

Saving in a file an array or DataFrame together with other information

The statistical software Stata allows short text snippets to be saved within a dataset. This is accomplished either using notes and/or characteristics. This is a feature of great value to me as it allows me to save a variety of information, ranging…
user8682794
9
votes
3 answers

OverflowError with Pandas to_hdf

Python newbie here. I am trying to save a large data frame into HDF file with lz4 compression using to_hdf. I use Windows 10, Python 3, Pandas 20.2 I get the error “OverflowError: Python int too large to convert to C long”. None of the machine…
Ron
  • 91
  • 1
  • 3
9
votes
4 answers

Yum Install libhdf5-dev on Amazon Linux

I am working on deploying project that uses hdf5 as a dependency: http://docs.h5py.org/en/latest/build.html And I am having a devil of a time install one of the dependencies for my elastic beanstalk deployment. HDF5 1.8.4 or newer, shared library…
Pylander
  • 1,531
  • 1
  • 17
  • 36
9
votes
1 answer

Access HDF files stored on s3 in pandas

I'm storing pandas data frames dumped in HDF format on S3. I'm pretty much stuck as I can't pass the file pointer, the URL, the s3 URL or a StringIO object to read_hdf. If I understand it correctly the file must be present on the filesystem. Source:…
fodma1
  • 3,485
  • 1
  • 29
  • 49