Questions tagged [hdf5]

The Hierarchical Data Format (HDF5) is a binary file format designed to store large amount of numerical data.

HDF5 refers to:

A binary file format designed to store efficiently large amount of numerical data
Libraries of function to create and manipulate these files

Main features

Free
Completely portable
Very mature
No limit on the number and size of the datasets
Flexible in the kind and structure of the data and meta-data
Complete library in C and Fortran well documented
A lot of wrappers and tools are available (Python, Matlab, Java, …)

Some links to get started

2598 questions

votes

3 answers

Sparse array support in HDF5

I need to store a 512^3 array on disk in some way and I'm currently using HDF5. Since the array is sparse a lot of disk space gets wasted. Does HDF5 provide any support for sparse array ?

sparse-matrix hdf5 sparse-array

asked Aug 23 '10 at 07:04

andreabedini

1,295
1
13
20

votes

1 answer

Writing a large hdf5 dataset using h5py

At the moment, I am using h5py to generate hdf5 datasets. I have something like this import h5py import numpy as np my_data=np.genfromtxt("/tmp/data.csv",delimiter=",",dtype=None,names=True) myFile="/tmp/f.hdf" with h5py.File(myFile,"a") as f: …

python numpy hdf5 h5py

asked Dec 30 '15 at 14:32

NinjaGaiden

3,046
6
28
49

votes

9 answers

How to best write out a std::vector < std::string > container to a HDF5 dataset?

Given a vector of strings, what is the best way to write them out to a HDF5 dataset? At the moment I'm doing something like the following: const unsigned int MaxStrLength = 512; struct TempContainer { char string[MaxStrLength]; }; …

c++ stl hdf5

asked Feb 24 '09 at 10:16

Richard Corden

21,389
8
58
85

votes

2 answers

Sharing large datasets between Matlab and R

I need a relatively efficient way to share data between Matlab and R. I have checked SaveR and MATLAB R-link, but SaveR formats Matlab's binary data as text strings first and then prints them to an ASCII file, which is not efficient for large…

r matlab hdf5 ramdisk

asked Jan 22 '11 at 22:50

Amelio Vazquez-Reina

91,494
132
359
564

votes

2 answers

When reading huge HDF5 file with "pandas.read_hdf() ", why do I still get MemoryError even though I read in chunks by specifying chunksize?

Problem description: I use python pandas to read a few large CSV file and store it in HDF5 file, the resulting HDF5 file is about 10GB. The problem happens when reading it back. Even though I tried to read it back in chunks, I still get…

python pandas hdf5

asked Jun 02 '15 at 03:54

Ewan

votes

7 answers

Converting hdf5 to csv or tsv files

I am looking for a sample code which can convert .h5 files to csv or tsv. I have to read .h5 and output should be csv or tsv. Sample code would be much appreciated,please help as i have stuck on it for last few days.I followed wrapper classes but…

csv bigdata hdf5

asked May 20 '14 at 11:43

Sanjay Tiwari

votes

3 answers

Pandas HDF5 as a Database

I've been using python pandas for the last year and I'm really impressed by its performance and functionalities, however pandas is not a database yet. I've been thinking lately on ways to integrate the analysis power of pandas into a flat HDF5 file…

python database pandas hdf5 pytables

asked Mar 20 '14 at 02:55

prl900

4,029
4
33
40

votes

3 answers

Faster reading of time series from netCDF?

I have some large netCDF files that contain 6 hourly data for the earth at 0.5 degree resolution. There are 360 latitude points, 720 longitude points, and 1420 time points per year. I have both yearly files (12 GB ea) and one file with 110 years of…

r performance io hdf5 netcdf

asked Nov 12 '13 at 17:54

David LeBauer

31,011
31
115
189

votes

5 answers

Saving dictionaries to file (numpy and Python 2/3 friendly)

I want to do hierarchical key-value storage in Python, which basically boils down to storing dictionaries to files. By that I mean any type of dictionary structure, that may contain other dictionaries, numpy arrays, serializable Python objects, and…

python python-3.x numpy hdf5 pytables

asked Aug 06 '13 at 02:52

Gustav Larsson

8,199
3
31
51

votes

1 answer

Storing Pandas objects along with regular Python objects in HDF5

Pandas has a nice interface that facilitates storing things like Dataframes and Series in an HDF5: random_matrix = np.random.random_integers(0,10, m_size) my_dataframe = pd.DataFrame(random_matrix) store = pd.HDFStore('some_file.h5',complevel=9,…

python pandas hdf5

asked Jul 23 '13 at 20:08

Amelio Vazquez-Reina

91,494
132
359
564

votes

3 answers

g++ compile error: undefined reference to a shared library function which exists

I recently installed the hdf5 library on an ubuntu machine, and am now having trouble linking to the exported functions. I wrote a simple test script readHDF.cpp to explain the issue: #include int main(int argc, char * argv[]) { hid_t …

g++ shared-libraries ld hdf5 undefined-reference

asked Feb 15 '13 at 23:48

dermen

5,252
4
23
34

votes

1 answer

Difference between HDF5 file and PyTables file

Is there a difference between HDF5 files and files created by PyTables? PyTables has two functions .isHDFfile() and .isPyTablesFile() suggesting that there is a difference between the two formats. I've done some looking around on Google and have…

python numpy hdf5 pytables

asked Nov 03 '11 at 22:13

dtlussier

3,018
2
26
22

votes

1 answer

Floating Point Exception with Numpy and PyTables

I have a rather large HDF5 file generated by PyTables that I am attempting to read on a cluster. I am running into a problem with NumPy as I read in an individual chunk. Let's go with the example: The total shape of the array within in the HDF5 file…

python numpy hdf5 pytables

asked Sep 30 '11 at 23:46

Tarun Chitra

votes

1 answer

Setting Attributes on Datasets using HDF5 C++ api

I'm using HDF5 C++ API in HDF5 1.8.7 and would like to use an H5::Attribute instance to set a couple of scalar attributes in an H5::DataSet instance, but cannot find any examples. It's pretty cut and dry using the C API: /* Value of the scalar…

c++ hdf5

asked May 13 '11 at 07:46

Marc

4,546
2
29
45

votes

1 answer

How to concat multiple pandas dataframes into one dask dataframe larger than memory?

I am parsing tab-delimited data to create tabular data, which I would like to store in an HDF5. My problem is I have to aggregate the data into one format, and then dump into HDF5. This is ~1 TB-sized data, so I naturally cannot fit this into RAM.…

pandas hdf5 dask pytables bigdata

asked Oct 09 '16 at 20:18

ShanZhengYang

16,511
49
132
234

Prev 1 2 3

…

99 100 Next