Questions tagged [hdf5]

The Hierarchical Data Format (HDF5) is a binary file format designed to store large amount of numerical data.

HDF5 refers to:

  • A binary file format designed to store efficiently large amount of numerical data
  • Libraries of function to create and manipulate these files

Main features

  • Free
  • Completely portable
  • Very mature
  • No limit on the number and size of the datasets
  • Flexible in the kind and structure of the data and meta-data
  • Complete library in C and Fortran well documented
  • A lot of wrappers and tools are available (Python, Matlab, Java, …)

Some links to get started

2598 questions
1
vote
1 answer

How to read one component of ArrayType stored in HDF5

I have an HDF5 dataset on file, which was written using an H5::ArrayType for double[3]. The DataSpace is one-dimensional (rank=1) with ndat entries (each of type double[3]). Now I want to read only the second, say, of each double[3] into a 1D buffer…
Walter
  • 44,150
  • 20
  • 113
  • 196
1
vote
0 answers

HDF5 error no file name specified, Crashes in Boost

I'm trying to build a source code on OSX but getting the following error when running it. It seems to be related HDF5 but I've got no clue how to fix that since the Linux and Win built of this source code already works fine. Error detected in HDF5…
Yasin
  • 609
  • 1
  • 10
  • 22
1
vote
1 answer

Should I create a large frame with index or many groups in HDF Store?

I have a daily time series of ~1.5 million rows per day, a 4-dimensional index, and 2 columns. Thus far I've put all this stuff into one DataFrame and shoved into a single group in an HDFStore. The problem now is that continuously appending to this…
user2734178
  • 227
  • 1
  • 9
1
vote
0 answers

python netCDF4 HDF error

I'm using python-netCDF4 library for reading/writing NetCDF files and have run into an issue, where writing character array fails for some reason, and the error message is not very informative, it is: File "netCDF4/_netCDF4.pyx", line 1675, in …
kakk11
  • 898
  • 8
  • 21
1
vote
2 answers

Strings vs binary for storing variables inside the file format

We aim at using HDF5 for our data format. HDF5 has been selected because it is a hierarchical filesystem-like cross-platform data format and it supports large amounts of data. The file will contain arrays and some parameters. The question is about…
Mauro Ganswer
  • 1,379
  • 1
  • 19
  • 33
1
vote
1 answer

"make check" fails when installing HDF5

I downloaded hdf5-1.8.15-patch1.tar.bz2, and tried to install it on my Ubuntu, with this command: CC=mpicc ./configure --enable-parallel --enable-shared make make check yet during make check, I got this error: ***** 1 FAILURE!…
yanggao
  • 231
  • 4
  • 7
1
vote
1 answer

Processing data on disk with a Pandas DataFrame

Is there a way to take a very large amount of data on disk (a few 100 GB) and interact with it on disk as a pandas dataframe? Here's what I've done so far: Described the data using pytables and this example:…
1
vote
0 answers

IO:Error Ipython notebook

I have a problem with creating a hdf5 file. On ubuntu it run without error messages. But when I try the code on windows 8 I get an IO:Error. I do not know how to get arround this problem. My code for the cell (The rest of the code is to big to post…
1
vote
0 answers

How to write efficent data to a hdf5 storage?

I try to save a 2 dimension array with 20000 x 30000 size but my computer can't handle this process and die when I run the script. I have a 8 GB Ram Windows7 machine with Python 2.7 64 Bit (Anaconda install). Can I do some optimization here? import…
gustavgans
  • 5,141
  • 13
  • 41
  • 51
1
vote
1 answer

Visualization of 3-dimensional grid from X_Y_Z(seperate datasets) on Paraview without using xdmf

Reading netcdf files with Paraview using xdmf I used to parse netcdf files with an xdmf script in order to create 3DSMesh on paraview. On top of it, I was adding scalar or vector fields. (So 3DSMesh provides physical coordinates). I never though…
trblnc
  • 222
  • 5
  • 14
1
vote
1 answer

Error with datetime column while serialising a DataFrame into an HDF5 store

I'm trying to save a DataFrame into an HDF5 store using pandas builtin function to_hdf but this raises the following exception: File "C:\python\lib\site-packages\pandas\io\pytables.py", line 3433, in > >create_axes raise e TypeError: Cannot…
Scilear
  • 216
  • 1
  • 3
  • 10
1
vote
1 answer

Pandas multiindex and pytables... separate indexes or one concatenated index?

What is the structure of a pandas multiindex on HDF5 when the data frame is saved to HDF5 through pytables? Are each of the parts a separate index or is there one concatenated index?
1
vote
2 answers

Python code for writing xmf for a single HDF5 that contains time sequenced data to visualise in Paraview, Visit

I have been trying to load a hdf5 file in Paraview using XMF. This is in the paradigm of visualization of big data using hdf as storage and xmf as metadata-linker to hdf-file read by paraview, visit and other big data visualization softwares. In the…
abhra
  • 174
  • 2
  • 6
1
vote
1 answer

Adding a row to a Pandas DataFrame that would duplicate index

I have a DataFrame with an index of type datetime objects. I am ultimately going to write this DataFrame to an HDF5 file using HDFStore.append. I am adding a lot of rows that need to be written to this HDF5 file. If i use HDFStore.append for every…
baconwichsand
  • 1,161
  • 2
  • 13
  • 31
1
vote
1 answer

HDF gzip compression vs. ASCII gzip compression

I have a 2D matrix with 1100x1600 data points. Initially, I stored it in an ascii-file which I tar-zipped using the command tar -cvzf ascii_file.tar.gz ascii_file Now, I wanted to switch to hdf5 files, but they are too large, at least in the way I…
Alf
  • 1,821
  • 3
  • 30
  • 48