Questions tagged [hdf5]

The Hierarchical Data Format (HDF5) is a binary file format designed to store large amount of numerical data.

HDF5 refers to:

  • A binary file format designed to store efficiently large amount of numerical data
  • Libraries of function to create and manipulate these files

Main features

  • Free
  • Completely portable
  • Very mature
  • No limit on the number and size of the datasets
  • Flexible in the kind and structure of the data and meta-data
  • Complete library in C and Fortran well documented
  • A lot of wrappers and tools are available (Python, Matlab, Java, …)

Some links to get started

2598 questions
1
vote
1 answer

efficient way to find last time stamp for each unique value in a column in HDF5 table

How can I efficiently find the last time stamp (from the Datetime column) for each unique value in the SecurityID column? There are roughly 1000 unique values in the SecurityID column. Currently I query the whole table for each unique value in…
user308883
1
vote
1 answer

Writing/reading large files with HDF5 and MPI using 1 process, from Python

When writing a large dataset to a file using parallel HDF5 via h5py and mpi4py (and quite possible also when using HDF5 and MPI directly from C), I get the following error if using the mpio driver with a single process: OSError: Can't prepare for…
jmd_dk
  • 12,125
  • 9
  • 63
  • 94
1
vote
0 answers

Saving and editing large amount of data in many files

I am working on a C# SW for Windows which control an acquisition device with 256 channels, 10kHz, 16 bit (5MB/s). The acquisition generally lasts between 1minute and 1hour; it is saved as raw data in a binary file whose resulting size may reach many…
Paolo
  • 76
  • 1
  • 4
1
vote
1 answer

How to read/write 3D+ arrays with JHDF5?

I see no methods to write arrays with dimensions greater than 2 in IHDF5SimpleWriter Is it possible to accomplish?
Suzan Cioc
  • 29,281
  • 63
  • 213
  • 385
1
vote
1 answer

select rows by comparing columns using HDFStore

How can I select some rows by comparing two columns from hdf5 file using Pandas? The hdf5 file is too big to load into memory. For example, I want to select rows where column A and columns B is equal. The dataframe is save in file 'mydata.hdf5'.…
Lee
  • 11
  • 1
1
vote
0 answers

Matlab HDF5 and JHDF5 library mismatch crash?

I'm using Matlab version R2013b on Linux Ubuntu 12.04 and have a Matlab application that depends on the Java HDF5 the one from HDFGROUP. My problem is that the JHDF5 wrapper composed of the two files jhdf5.jar and libjhdf5.so seems to be in some…
SkyWalker
  • 13,729
  • 18
  • 91
  • 187
1
vote
1 answer

HDFStore start stop not working

Is it clear what I am doing wrong? I'm experimenting with pandas HDFStore.select start and stop options and it's not making a difference. The commands I'm using are: import pandas as pd hdf = pd.HDFStore(path %…
user3659451
  • 1,913
  • 9
  • 30
  • 43
1
vote
0 answers

HDF5 Database Design Lookup Tables vs storage of redundant data

I've worked on several fairly small scale legacy HDF5 databases and each one utilizes grouping to perform lookups. For example as a contrived example lets say I have one 2 dimensional dataset where each cell maps back to a group which may store…
alexa
  • 11
  • 1
1
vote
1 answer

NetCDF 4.5 Java Problems with NetCDF files version 4 + old code for HDF does not work

I have file of NetCDF version 3. I used the latest ncks for Windows (released 1 Oct 2014) to rechunk my file ncks -4 --cnk_dmn lat,4 --cnk_dmn lon,4 --cnk_dmn time,512 2014.nc 2014_chunked.nc what produced 2014_chunked.nc file of NetCDF version 4…
Antonio
  • 756
  • 7
  • 26
1
vote
1 answer

"Group By" multiple columns on Large Data in HDFStore

Pandas "Group By" Query on Large Data in HDFStore? I have tried the example in the answer except that I would like to be able to group by two columns. Basically, modifying the code to look like with pd.get_store(fname) as store: …
1
vote
1 answer

Assign dimensions from variables

I have netCDF dataset with x, y as spatial dimensions (in Lambert conic projection) which are just enumerated values [0:495], [0:309], and lat, lon variables as 2D meshes with shapes (309, 495). I want to assign lon, lat variables to x, y dimensions…
vlad
  • 771
  • 2
  • 10
  • 21
1
vote
1 answer

The HDF5 library does not compile with -mcmodel=large

I'm trying to compile a large scale FORTRAN application and need to link the HDF5 library to it. The program needs to be compiled with gfortran and needs the -mcmodel=large options. When using only -mcmodel=medium I'm getting error messages…
Andre
  • 427
  • 4
  • 17
1
vote
0 answers

HDF5 Output Error with Parallel Programming in Python (HDF5: infinite loop closing library)

I'm trying to parallelize my python code with MPI. I'm reading in my input from a txt file and writing the output in an HDF5 file. When I submit my job to the queue (just one node, 32ppn), I get the following error when I opening the output hdf5…
d1415
  • 11
  • 4
1
vote
1 answer

Validating an HDF5 superblock checksum

I am having a problem writing a program which verifies the checksum in the superblock of an HDF5, Version 2 file. I am not using the HDF5 software, but I have a copy of H5_checksum_fletcher32 (from the HDF5 H5checksum.c) in my code. I can assume…
jkane
  • 51
  • 1
1
vote
0 answers

PyTables writing error

I am creating and filling a PyTables Carray the following way: #a,b = scipy.sparse.csr_matrix f = tb.open_file('../data/pickle/dot2.h5', 'w') filters = tb.Filters(complevel=1, complib='blosc') out = f.create_carray(f.root, 'out',…
fsociety
  • 1,791
  • 4
  • 22
  • 32