1

I write my application data as H5Datasets using ILNumerics api. With small size data (5 X 4098), I am able to visualize the data using HDFView(version 2.9). But for the data of size 30 x4098 and more, HDFView crashes silently as I try to look at the dataset table.

I am not using any compression parameters.Here is the sample code with which we can reproduce the issue.

private void button1_Click(object sender, EventArgs e)
 {
   using (var file = new H5File("testwrite.h5"))
   {
     var ds = new H5Dataset("data");
     file.Add(ds);
     ds.Set(ILMath.rand(30,4096));
   }
 }

I need to be able to visualise/copy the data from H5file. Any pointers please? Thanks

Neelima
  • 243
  • 1
  • 8

2 Answers2

1

If you do not have enough memory to open a dataset in HDFView, it will not be able to open the dataset. See the HDFView User's Guide:


5.2 Subset and Dimension Selection

Opening an entire large dataset may cause an ‘Out Of Memory Error’ because the Java
Virtual Machine cannot create the required objects. HDFView provides options to select a subset of a dataset for display. You can also select dimensions and the order of dimensions to display, e.g., to switch the columns and rows.


However, HDFView should not crash and should give you an "Out of Memory Error", but it does not always do that. There are many cases where HDFView does not handle Exceptions properly, and we are working to correct those issues.

I was able to create a 64-bit integer dataset of size 7000x7000 and view it with HDFView 2.10.1 on my machine, but my machine has lots of memory.

If you have a large dataset that is chunked, you can view a subset, as mentioned in Section 5.2 in the User's Guide. To do that, right click on the dataset, and select "Open As" from the pop-up menu. A window opens, and from there you can select a subset of the dataset to view. With chunking, only those chunks that contain the subset will be read, so the entire dataset does not have to be read into memory.

If you would like to follow up on this, please do contact me at The HDF Group Helpdesk. See this page for details: http://www.hdfgroup.org/about/contact.html

Thanks!

-Barbara

HDF Helpdesk

Barbara
  • 11
  • 2
  • Thanks Barbara for your answer. But 30X4098 is not a huge data to throw an OutOfMemory exception.I begin to think that the culprit is Java as Haymo from ILNumerics clarified that he is able to open the same file on his system without any issues. – Neelima Aug 13 '14 at 16:38
0

It turned out that the reason for the problems were caused by a too small chunk size for the dataset.

In ILNumerics, the chunk size is derived automatically from the initial data used at time of creation. If no data are given to the constructor, the chunk size is set to 1x1.

This is a suboptimal design in the ILNumerics API. It should prevent the user from accidentally creating datasets with too small chunk sizes. Instead, here it does not even complain about a missing data argument in the constructor. We will make the creation API more explicit with the next release.

In order to fix the problem, just provide some array to the constructor of the dataset. Give it a size which you find reasonable as the chunk size for the new dataset. Keep in mind, the chunk size is something which cannot change once the dataset was created!

using (var file = new H5File("testwrite.h5"))
{
  var ds = new H5Dataset("data", ILMath.rand(30,4096));
  file.Add(ds);
  // ds.Set(ILMath.rand(30,4096)); <- this is not needed anymore
}

If for some reasons you don't have true initial data at time of creating the dataset, you may either

  • delay the creation to the time when those data are available, or
  • use dummy data on the dataset constructor in order to determine the chunk size
Haymo Kutschbach
  • 3,322
  • 1
  • 17
  • 25