Questions tagged [hdf]

Hierarchical Data Format (HDF, HDF4, or HDF5) is a set of file formats and libraries designed to store and organize large amounts of numerical data.

Hierarchical Data Format (HDF, HDF4, or HDF5) is a set of file formats and libraries designed to store and organize large amounts of numerical data.

Originally developed at the National Center for Supercomputing Applications, it is supported by the non-profit HDF Group, whose mission is to ensure continued development of HDF5 technologies, and the continued accessibility of data stored in HDF.

In keeping with this goal, the HDF format, libraries and associated tools are available under a liberal, BSD-like license for general use. HDF is supported by many commercial and non-commercial software platforms, including Java, MATLAB/Scilab, Octave, IDL, Python, and R. The freely available HDF distribution consists of the library, command-line utilities, test suite source, Java interface, and the Java-based HDF Viewer (HDFView).

There are two major versions of HDF; HDF4 and HDF5, which differ significantly in design and API.

Wikipedia: http://en.wikipedia.org/wiki/Hierarchical_Data_Format

344 questions
1
vote
0 answers

Relatively large HDF5 files using h5py - sanity check

I'm doing deep learning with caffe and generating my own dataset in HDF5 format. I have 131 976 images all 224x224 which come to about 480MB, and each image has a 1x6 array as a label. I've found that when I generate the .h5 files, they come to 5GB…
Joe Watson
  • 11
  • 1
1
vote
1 answer

Pandas to_hdf succeeds but then read_hdf fails

Pandas to_hdf succeeds but then read_hdf fails when I use custom objects as column headers (I use custom objects because I need to store other info in them). Is there some way to make this work? Or is this just a Pandas bug or PyTables bug? As an…
aiai
  • 525
  • 4
  • 11
1
vote
0 answers

Appending HDFStore fails, cannot match existing table structure

Running into problems when trying to send a dataframe to hdf5 in small chunks via pd.HDFStore('mystore.h5', mode='a').append(my_frame, chunk). The chunks are all the same in terms of columns and types (they come from the same dataframe) But It…
asdf
  • 836
  • 1
  • 12
  • 29
1
vote
2 answers

How to mosaic the same HDF files using this R function?

There are more than 1,000 MODIS HDF images in a folder: M:\join Their names show us which files must be mosaiced together. For example, in the below files, 2009090 means these three images must be mosaiced together:…
Canada2015
  • 65
  • 1
  • 9
1
vote
0 answers

HDF Store: Saving list of float-tuples to file

I have a big csv file in which one column contains a list of gps coordinates as float-tuples. Of course if I read in the file as a pandas dataframe, their type is simply String, which is not that useful. What I want to do is to convert the Strings…
SGer
  • 544
  • 4
  • 18
1
vote
1 answer

pandas transform a csv into a h5 file avoiding memory error

I have this simple code data = pd.read_csv(file_path + 'PSI_TS_clean.csv', nrows=None, names=None, usecols=None) data.to_hdf(file_path + 'PSI_TS_clean.h5', 'table') but my data is too big and I run into memory issues. What is…
Donbeo
  • 17,067
  • 37
  • 114
  • 188
1
vote
1 answer

HDFStore.select an order of magnitude slower than DataFrame slicing?

Given a simple DataFrame with an integer index and a float column, this code: store = pd.HDFStore('test.hdf5') print store.select('df', where='index >= 50000')['A'].mean() is at least 10 times slower than this code: store =…
flyingmig
  • 133
  • 6
1
vote
1 answer

It is possible to read .Rdata file format from C or Fortran?

I'm working writing some R extensions on C (C functions to be called from R). My code needs to compute a statistic using 2 different datasets at the same time, and I need to perform this with all possible pair combinations. Then, I need all these…
srodrb
  • 1,304
  • 13
  • 23
1
vote
0 answers

HDF5 Database Design Lookup Tables vs storage of redundant data

I've worked on several fairly small scale legacy HDF5 databases and each one utilizes grouping to perform lookups. For example as a contrived example lets say I have one 2 dimensional dataset where each cell maps back to a group which may store…
alexa
  • 11
  • 1
1
vote
1 answer

NetCDF 4.5 Java Problems with NetCDF files version 4 + old code for HDF does not work

I have file of NetCDF version 3. I used the latest ncks for Windows (released 1 Oct 2014) to rechunk my file ncks -4 --cnk_dmn lat,4 --cnk_dmn lon,4 --cnk_dmn time,512 2014.nc 2014_chunked.nc what produced 2014_chunked.nc file of NetCDF version 4…
Antonio
  • 756
  • 7
  • 26
1
vote
1 answer

Pandas HDFStore - Get Last Record from Multiple Tables

I have a large number of data frames exported to a series of HDFStore files through Pandas. I need to be able to quickly pull in the most recent record, for each of these dataframes on demand. The setup: File…
bazel
  • 299
  • 7
  • 20
1
vote
0 answers

How to read HDF data from HDFS for Hadoop

I am working in Image processing on Hadoop. I am using HDF satellite data for processing, I can access and use jpg and other image types of data in hadoop streaming. But while using HDF data it comes with error. Hadoop couldnt read HDF data from…
prabu
  • 131
  • 2
  • 5
  • 11
1
vote
0 answers

How to enable HDFS caching on Amazon EMR?

What's the easiest way to enable HDFS Caching on EMR ? More specifically, how to set dfs.datanode.max.locked.memory and increase the "maximum size that may be locked into memory" (ulimit -l) on all nodes ? The following code seems to work fine for…
Jerome Serrano
  • 1,835
  • 2
  • 16
  • 27
1
vote
1 answer

Can the frequency of a Pandas tseries DatetimeIndex be preserved when writing to an HDFStore?

I have a Pandas DataFrame in which the index is (notice the Freq: H) - [2011-01-01 00:00:00, ..., 2013-12-31 23:00:00] Length: 26304, Freq: H, Timezone: None There are multiple columns but the first few…
DavidJ
  • 4,369
  • 4
  • 26
  • 42
1
vote
1 answer

HDF4 file on Anaconda distribution of python

I am trying to read an HDF4 file with my Anaconda python distributions on 64-bit Windows 7. I have tried to do a conda install of both the pyhdf and pyNio packages, but Anaconda seems to find neither. Does anyone have any advice on how to do this? I…
AGK
  • 57
  • 2
  • 6