JHDF5 - How to avoid dataset being overwritten

Question

I am using JHDF5 to log a collection of values to a hdf5 file. I am currently using two ArrayLists to do this, one with the values and one with the names of the values.

ArrayList<String> valueList = new ArrayList<String>(); 
ArrayList<String> nameList = new ArrayList<String>();

valueList.add("Value1");
valueList.add("Value2");
nameList.add("Name1");
nameList.add("Name2");

IHDF5Writer writer = HDF5Factory.configure("My_Log").keepDataSetsIfTheyExist().writer();    
HDF5CompoundType<List<?>> type = writer.compound().getInferredType("", nameList, valueList);        

writer.compound().write("log1", type, valueList);       
writer.close();

This will log the values in the correct way to the file My_Log and in the dataset "log1". However, this example always overwrites the previous log of the values in the dataset "log1". I want to be able to log to the same dataset everytime, adding the latest log to the next line/index of the dataset. For example, if I were to change the value of "Name2" to "Value3" and log the values, and then change "Name1" to "Value4" and "Name2" to "Value5" and log the values, the dataset should look like this:

enter image description here

I thought the keepDataSetsIfTheyExist() option to would prevent the dataset to be overwritten, but apparently it doesn't work that way.

Something similar to what I want can be achieved in some cases with writer.compound().writeArrayBlock(), and specify by what index the array block shall be written. However, this solution doesn't seem to be compatible with my current code, where I have to use lists for handling my data.

Is there some option to achieve this that I have overlooked, or can't this be done with JHDF5?

score 0 · Answer 1 · answered Jan 21 '15 at 15:52

I don't think that will work. It is not quite clear to me, but I believe the getInferredType() you are using is creating a data set with 2 name -> value entries. So it is effectively creating an object inside the hdf5. The best solution I could come up with was to read the previous values add them to the valueList before outputting:

    ArrayList<String> valueList = new ArrayList<>();

    valueList.add("Value1");
    valueList.add("Value2");

    try (IHDF5Reader reader = HDF5Factory.configure("My_Log.h5").reader()) {
        String[] previous = reader.string().readArray("log1");
        for (int i = 0; i < previous.length; i++) {
            valueList.add(i, previous[i]);
        }
    } catch (HDF5FileNotFoundException ex) {
        // Nothing to do here.
    }

    MDArray<String> values = new MDArray<>(String.class, new long[]{valueList.size()});
    for (int i = 0; i < valueList.size(); i++) {
        values.set(valueList.get(i), i);
    }

    try (IHDF5Writer writer = HDF5Factory.configure("My_Log.h5").writer()) {
        writer.string().writeMDArray("log1", values);
    }

If you call this code a second time with "Value3" and "Value4" instead, you will get 4 values. This sort of solution might become unpleasant if you start to have hierarchies of datasets however.

Thank you for your answer! Alas, this is not the setup I was looking for, and perhaps I was too vague in my question (which I will edit later when I have time). The key thing is that the values need to correspond to the names. When I write to my log file, the values should be inserted in a new row and each value in their corresponding name column. — HigiPha, Jan 21 '15 at 16:41

SOG · Answer 2 · 2021-07-02T08:52:40.037

To solve your issue, you need to define the dataset log1 as extendible so that it can store an unknown number of log entries (that are generated over time) and write these using a point or hyperslab selection (otherwise, the dataset will be overwritten).

If you are not bound to a specific technology to handle HDF5 files, you may wish to give a look at HDFql which is an high-level language to manage HDF5 files easily. A possible solution for your use-case using HDFql (in Java) is:

public class Example
{

    public Class Log
    {
        String name1;
        String name2;
    }

    public boolean doSomething(Log log)
    {
        log.name1 = "Value1";
        log.name2 = "Value2";
        return true;
    }

    public static void main(String args[])
    {
        // declare variables
        Log log = new Log();
        int variableNumber;

        // create an HDF5 file named 'My_Log.h5' and use (i.e. open) it
        HDFql.execute("CREATE AND USE FILE My_Log.h5");

        // create an extendible HDF5 dataset named 'log1' of data type compound
        HDFql.execute("CREATE DATASET log1 AS COMPOUND(name1 AS VARCHAR, name2 AS VARCHAR)(0 TO UNLIMITED)");

        // register variable 'log' for subsequent usage (by HDFql)
        variableNumber = HDFql.variableRegister(log);

        // call function 'doSomething' that does something and populates variable 'log' with an entry
        while(doSomething(log))
        {
            // alter (i.e. extend) dataset 'log1' to +1 (i.e. add a new row)
            HDFql.execute("ALTER DIMENSION log1 TO +1");

            // insert (i.e. write) data stored in variable 'log' into dataset 'log1' using a point selection
            HDFql.execute("INSERT INTO log1(-1) VALUES FROM MEMORY " + variableNumber);
        }
    }

}

JHDF5 - How to avoid dataset being overwritten

2 Answers2