1

I'm trying to load a dataset from a HDF5 file in C# (.NET Framework) in such a way that I have the contents in an array, e.g. float[,]. I found the HDF.PInvoke library, but I find it very difficult to figure out how to use it.

Update

From Soonts answer, I managed to get it to work. Here's my working snippet:

using System;
using System.Runtime.InteropServices;
using HDF.PInvoke;

namespace MyNamespace
{
    class Program
    {
        static void Main()
        {
            string datasetPath = "/dense1/dense1/kernel:0";
            long fileId = H5F.open(@"\path\to\weights.h5", H5F.ACC_RDONLY);
            long dataSetId = H5D.open(fileId, datasetPath);
            long typeId = H5D.get_type(dataSetId);

            // read array (shape may be inferred w/ H5S.get_simple_extent_ndims)
            float[,] arr = new float[162, 128];
            GCHandle gch = GCHandle.Alloc(arr, GCHandleType.Pinned);
            try
            {
                H5D.read(dataSetId, typeId, H5S.ALL, H5S.ALL, H5P.DEFAULT,
                         gch.AddrOfPinnedObject());
            }
            finally
            {
                gch.Free();
            }

            // show one entry
            Console.WriteLine(arr[13, 87].ToString());

            // Keep the console window open in debug mode.
            Console.WriteLine("Press any key to exit.");
            Console.ReadKey();
        }
    }
}

Original first attempt:

What I've managed so far:

using System;
using System.IO;
using System.Runtime.InteropServices;
using HDF.PInvoke;

namespace MyNamespace
{
    class Program
    {
        static void Main()
        {
            string datasetPath = "/dense1/dense1/bias:0";
            long fileId = H5F.open(@"\path\to\weights.h5", H5F.ACC_RDONLY);
            long dataSetId = H5D.open(fileId, datasetPath);
            long typeId = H5D.get_type(dataSetId);
            long spaceId = H5D.get_space(dataSetId);

            // not sure about this
            TextWriter tw = Console.Out;
            GCHandle gch = GCHandle.Alloc(tw);

            // I was hoping that  this would write to the Console, but the
            // program crashes outside the scope of the c# debugger.
            H5D.read(
                dataSetId,
                typeId,
                H5S.ALL,
                H5S.ALL,
                H5P.DEFAULT,
                GCHandle.ToIntPtr(gch)
            );

            // Keep the console window open in debug mode.
            Console.WriteLine("Press any key to exit.");
            Console.ReadKey();
        }
    }
}

The signature for H5F.read() is:

Type    Name            Description
--------------------------------------------------------------
long    dset_id         Identifier of the dataset read from.
long    mem_type_id     Identifier of the memory datatype.
long    mem_space_id    Identifier of the memory dataspace.
long    file_space_id   Identifier of the dataset's dataspace in the file.
long    plist_id        Identifier of a transfer property list for this I/O operation.
IntPtr  buf             Buffer to receive data read from file.

Question

Could anyone help me fill in the blanks here?

Kris
  • 22,079
  • 3
  • 30
  • 35
  • 1
    BTW, HDF5 is crap. API design and documentation are not great. It’s slow. It’s single threaded, crashes when trying to use from multiple threads, even for different files. It’s unreliable, if anything goes wrong often the complete file with all datasets is destroyed. If you gonna use it to write stuff not just read, I recommend looking for an alternative. – Soonts Aug 02 '19 at 07:29
  • Believe me, I'm super frustrated by the whole thing too. It's just that I need it to deserialize some model weights created by keras, which uses h5py. – Kris Aug 02 '19 at 08:25
  • 1
    You can replace your reshape code with a single line `Buffer.BlockCopy( arrFlat, 0, arr, 0, w * h * 4 );` Gonna be much faster. Not that it matters for such small arrays, but still. – Soonts Aug 02 '19 at 08:56
  • Also, since you don’t need to transpose or pad your data, you can try to pin the 2D array, and pass that to native library. Will probably work, too, this way you don't need to copy/reshape at all. – Soonts Aug 02 '19 at 08:57
  • You're right, using a 2D array directly works as well. This is quite important for my use-case, because some of my arrays are huge. So thanks again! – Kris Aug 05 '19 at 07:04
  • Could you please clarify if the `kernel:0` is something important to the syntax? Or is that just the name of your dataset? I'm having trouble getting it to find my dataset. In python I named it `J`. But when I load it in c# using `datasetPath = "/J"`; it doesn't find it. – Adam B Apr 14 '20 at 16:15
  • 1
    @AdamB `kernel:0` is just the name that Keras has given to the dataset upon calling `keras_model.save_weights('weights.h5')` (in python). – Kris Apr 15 '20 at 00:41

2 Answers2

2

You need to create an array (normal 1D one, not the 2D) of the correct size and type. Then write something like this:

int width = 1920, height = 1080;
float[] data = new float[ width * height ];
var gch = GCHandle.Alloc( data, GCHandleType.Pinned );
try
{
    H5D.read( /* skipped */, gch.AddrOfPinnedObject() );
}
finally
{
    gch.Free();
}

This will read the dataset into the data array, you can then copy individual lines into another, 2D array if you need that.

Read API documentation how to get dimensions (HDF5 supports data set of arbitrary dimensions) and size of the dataset (for 2D dataset the size is 2 integers), i.e. how to find out the buffer size you need (for 2D dataset, it's width * height).

As for the elements type, you better know that in advance, e.g. float is fine.

Soonts
  • 20,079
  • 9
  • 57
  • 130
  • Thanks so much! The `using` doesn't work for me (it's complaining that `GCHandle` cannot be converted to `IDisposable`), but the rest works. Thanks! – Kris Aug 02 '19 at 08:12
0

Alternatively, maybe you want to take a look at HDFql as it alleviates from HDF5 low-level details. Your (above posted) solution may be re-written/simplified using HDFql as follows:

using System;
using System.Runtime.InteropServices;
using AS.HDFql;   // use HDFql namespace (make sure it can be found by the C# compiler)

namespace MyNamespace
{
    class Program
    {
        static void Main()
        {
            // dims
            int h = 162;
            int w = 128;

            // read array
            float[] arrFlat = new float[h * w];

            HDFql.Execute("SELECT FROM \\path\\to\\weights.h5 \"/dense1/dense1/kernel:0\" INTO MEMORY " + HDFql.VariableTransientRegister(arrFlat));        

            // reshape
            float[,] arr = new float[h, w];  // row-major
            for (int i = 0; i < h; i++)
            {
                for (int j = 0; j < w; j++)
                {
                    arr[i, j] = arrFlat[i * w + j];
                }
            }

            // show one entry
            Console.WriteLine(arr[13, 87].ToString());
            Console.WriteLine(arrFlat[13 * w + 87].ToString());

            // Keep the console window open in debug mode.
            Console.WriteLine("Press any key to exit.");
            Console.ReadKey();
        }
    }
}

Additional examples on how to read datasets using HDFql can be found in the quick start guide and reference manual.

SOG
  • 876
  • 6
  • 10