2

Suppose I have a table with different columns. These columns have different types. For example:

col1,   col2,   col3
1.1    "hello"   1
2.2    "world"   2
...

Here col1 is a float or double, col2 is a string and col3 an int. Now I want to store those data in an hdf5 file, using C++. How can I do this?

I have already found this post with the same question for Python. But I cannot figure out how to do it in C++, because, as far as I found, you have to specify the type for the entire dataset.

Edit: And is there a way to have a column with arrays, let's say arrays of integers? For example:

col1,   col2,   col3,   col4
1.1    "hello"   1     {1, 3}
2.2    "world"   2     {1, 2, 3}
...
Tom de Geus
  • 5,625
  • 2
  • 33
  • 77
Igor
  • 21
  • 3
  • Igor, HDF5 supports tables with different datatypes in each column. I use Python and these are defined with NumPy record arrays. I'm not a C++ developer. As I understand, you need to create a data struct that defines the different variable types. – kcw78 Apr 13 '19 at 21:37
  • I'm not sure how to do this (and I don't think that the wrapper library [HighFive](https://github.com/BlueBrain/HighFive) that I use to read/write in C++ can do this). A workaround could be to create one dataset per column and group these datasets. – Tom de Geus Apr 15 '19 at 14:14
  • I am unsure about first part of your question, maybe here: https://support.hdfgroup.org/HDF5/doc/cpplus_RM/examples.html you'll find something helpful. About the second part, there is and is perfectly explained here on stackoverflow. https://stackoverflow.com/questions/35580279/how-to-create-multi-value-attribute-in-a-hdf5-file-using-c-api – PStarczewski Apr 15 '19 at 21:01

1 Answers1

0

Please take a look at HDFql as it can alleviate you from the low-level details of handling HDF5 compound datasets. Using HDFql in C++, your issue can be solved as follows:

// define structure
struct data
{
    float col1;
    char *col2;
    int col3;
    HDFQL_VARIABLE_LENGTH col4;
};


// declare variables
char script[1024];
struct data values[2];
int *address;


// create a dataset named "my_dataset" of data type compound of one dimension (size 2) composed of four members (col1, col2, col3 and col4)
HDFql::execute("CREATE DATASET my_dataset AS COMPOUND(col1 AS FLOAT, col2 AS VARCHAR, col3 AS INT, col4 AS VARINT)(2)");


// populate variable "values"
values[0].col1 = 1.1;
values[0].col2 = (char *) malloc(10);
strcpy(values[0].col2, "hello");
values[0].col3 = 1;
address = (int *) malloc(2 * sizeof(int));
*address = 1;
*(address + 1) = 3;
values[0].col4.address = address;
values[0].col4.count = 2;

values[1].col1 = 2.2;
values[1].col2 = (char *) malloc(10);
strcpy(values[1].col2, "world");
values[1].col3 = 2;
address = (int *) malloc(3 * sizeof(int));
*address = 1;
*(address + 1) = 2;
*(address + 2) = 3;
values[1].col4.address = address;
values[1].col4.count = 3;


// insert (i.e. write) values from variable "values" into dataset "my_dataset"
sprintf(script, "INSERT INTO my_dataset VALUES FROM MEMORY %d SIZE %d OFFSET(%d, %d, %d, %d)", HDFql::variableTransientRegister(values), sizeof(struct data), offsetof(struct data, col1), offsetof(struct data, col2), offsetof(struct data, col3), offsetof(struct data, col4));
HDFql::execute(script);

Optionally, you could have the dataset being extendable in its first dimension (by declaring it with an unlimited size) and then write one row, extend the dimension with one unit, write another row, extend the dimension with one unit, etc...

Additional details and examples on how to use HDFql may be found in the reference manual.

SOG
  • 876
  • 6
  • 10