Questions tagged [hdf]

Hierarchical Data Format (HDF, HDF4, or HDF5) is a set of file formats and libraries designed to store and organize large amounts of numerical data.

Hierarchical Data Format (HDF, HDF4, or HDF5) is a set of file formats and libraries designed to store and organize large amounts of numerical data.

Originally developed at the National Center for Supercomputing Applications, it is supported by the non-profit HDF Group, whose mission is to ensure continued development of HDF5 technologies, and the continued accessibility of data stored in HDF.

In keeping with this goal, the HDF format, libraries and associated tools are available under a liberal, BSD-like license for general use. HDF is supported by many commercial and non-commercial software platforms, including Java, MATLAB/Scilab, Octave, IDL, Python, and R. The freely available HDF distribution consists of the library, command-line utilities, test suite source, Java interface, and the Java-based HDF Viewer (HDFView).

There are two major versions of HDF; HDF4 and HDF5, which differ significantly in design and API.

Wikipedia: http://en.wikipedia.org/wiki/Hierarchical_Data_Format

344 questions
3
votes
1 answer

Pandas - read_hdf or store.select returning incorrect results for a query

I have a large dataset (4 million rows, 50 columns) stored via pandas store.append. When I use either store.select or read_hdf with a query for 2 columns being greater than a certain value (i.e. "(a > 10) & (b > 1)" I get 15,000 or so rows…
Neil
  • 31
  • 2
3
votes
1 answer

Writing 2-D array int[n][m] to HDF5 file using Visual C++

I'm just getting started with HDF5 and would appreciate some advice on the following. I have a 2-d array: data[][] passed into a method. The method looks like: void WriteData( int data[48][100], int sizes[48]) The size of the data is not actually…
Dave
  • 8,095
  • 14
  • 56
  • 99
2
votes
0 answers

How can I read an in-memory HDF byte array back into a pandas DataFrame without writing a file?

I can convert a pandas dataframe to an HDF byte array like so: data = ... # a pandas dataframe with pd.HDFStore( "in-memory-save-file", mode="w", driver="H5FD_CORE", …
Edy Bourne
  • 5,679
  • 13
  • 53
  • 101
2
votes
1 answer

Can you read HDF5 dataset directly into SharedMemory with Python?

I need to share a large dataset from an HDF5 file between multiple processes and, for a set of reasons, mmap is not an option. So I read it into a numpy array and then copy this array into shared memory, like this: import h5py from multiprocessing…
monday
  • 319
  • 2
  • 12
2
votes
0 answers

Error after HDF5 file type transfer via socket: OSError: Unable to open file

I recently wrote a client / server application for transferring files in the hickle (HDF5 based file) format. The export, import and reading with the generated file works as intended. The source code of the client, who sends the file, looks like…
Aiden3301
  • 41
  • 5
2
votes
0 answers

Python freezes when writing netCDF in NETCDF4 format, but NETCDF3_CLASSIC format works fine

I am using Python to save 300x470x27x24 matrices to netcdf files. So far I was using the following code to create file handle: nc_output = Dataset(nc_filename,'w',format='NETCDF3_CLASSIC') It works fine, but I would like to save some disk space and…
Marcin Kawka
  • 277
  • 2
  • 9
2
votes
1 answer

How to work with HDF file (fixed format, multiple keys) as a pandas dataframe?

I was given a 20GB HDF5 file created using pandas, but unfortunately written in the fixed format (rather than table) with each column written as a separate key. This works nicely for quickly loading one feature, but it doesn't allow handy…
Shawn
  • 573
  • 3
  • 7
  • 17
2
votes
0 answers

Most efficient way of saving a pandas dataframe or 2d numpy array into h5py, with each row a seperate key, using a column

This is a follow up to this stackoverflow question Column missing when trying to open hdf created by pandas in h5py Where I am trying to create save a large amount of data onto a disk (too large to fit into memory), and retrieve sepecific rows of…
SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116
2
votes
1 answer

Appending pandas data to hdf store, getting 'TypeError: object of type 'int' has no len()' error

Motivation: I have about 30 million rows of data, one column being an index value, the other being a list of 512 int32 numbers. I wish to only retrieve maybe a thousand or so at a time, so I want to create some sort of datastore that can look up the…
SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116
2
votes
2 answers

OSError: Unable to open file (file signature not found) / End of HDF5 error back trace

I have a small (< 6Mb) .hdf file (obtained from the LAADS DAAC service). I have tried pandas and h5py to open it, to no avail (code shown below). I also tested the file with: $ h5dump -n data.hdf h5dump error: unable to open file "data.hdf" and $…
Gabriel
  • 40,504
  • 73
  • 230
  • 404
2
votes
1 answer

Reading hdf file - https and Xarray

I am trying to read hdf files, over https connection, from the Harmonized Landsat Sentinel repository (here: https://hls.gsfc.nasa.gov/data/v1.4/ Ideally, I would use xarray to do this. Here is an example: Example of…
Rowan_Gaffney
  • 452
  • 5
  • 17
2
votes
2 answers

How can I convert latitude-longitude to ease grid of NASA?

I am working with he5 files downloaded from nasa eosdis. I successfully read the files using rhdf5 package in r. The hdf file has subdatasets consist of a matrix, dim 721 x 721. As far as I understand the file is built as easegrid which does not…
mr.salomon
  • 33
  • 5
2
votes
1 answer

Writing a Jagged Array in HDF5 using the Java Native Library

I have tried numerous ways and followed some of the examples that are scattered around the web on how to write a jagged array (an array of arrays that may be of differing lengths) in HDF5. Most of the examples are in C and rather low-level. Anyhow I…
2
votes
0 answers

Best way to read / broadcast parallel data to all processors in hdf5

I have a parallel fortran application and I would like to read in medium-to-large data arrays that will be "mirrored" on all processors. In other words, this is "global" data such as observation locations which is necessary for all processing…
Menos
  • 43
  • 5
2
votes
1 answer

Can't access external link with python + h5py

Recently I have started working with .hdf5 files and still can't figure out how to properly use external links. I have got a few .hdf5 files. Each file has got the same structure e.g. same keys and data types. I want to merge them into one file but…
Denis
  • 719
  • 2
  • 8
  • 23