I am trying to fill a dataset in an HDF5 file iteratively using HDFql. What I mean by iteratively, is that my simulator occasionally comes along with an update and I wish to dump some more data (which is contained in a std::vector
) into my dataset. Weirdly though, something breaks after a few 'iterations' and my dataset begins to just fill with zeros.
Luckily, this error also occurs in a minimal example and seems to be reproducible with the below code:
#include <stdio.h>
#include <random>
#include <HDFql.hpp>
int main (int argc, const char * argv[]) {
HDFql::execute("CREATE TRUNCATE FILE /tmp/test_random.h5");
HDFql::execute("USE FILE /tmp/test_random.h5");
HDFql::execute("CREATE GROUP data");
HDFql::execute("CREATE CHUNKED DATASET data/vals AS SMALLINT(UNLIMITED)");
HDFql::execute("CLOSE FILE");
std::stringstream ss;
std::random_device rd;
std::mt19937 eng(rd());
std::uniform_int_distribution<> dist_vals(0, 500);
std::uniform_int_distribution<> dist_len(300, 1000);
for(int i=0; i<500; i++)
{
const int num_values = dist_len(eng);
std::vector<uint16_t> vals;
for(int i=0; i<num_values; i++)
{
const int value = dist_vals(eng);
vals.push_back(value);
}
HDFql::execute("USE FILE /tmp/test_random.h5");
ss << "ALTER DIMENSION data/vals TO +" << vals.size();
HDFql::execute(ss.str().c_str()); ss.str("");
ss << "INSERT INTO data/vals(-" << vals.size() << ":1:1:" << vals.size()
<< ") VALUES FROM MEMORY "
<< HDFql::variableTransientRegister(vals.data());
HDFql::execute(ss.str().c_str()); ss.str("");
HDFql::execute("CLOSE FILE");
}
}
This code runs for 500 'iterations', filling the data vector with a random amount of random data each time. In my latest run, everything beyond data cell 4065 in the final output hdf file was just zeros.
So my question is: what am I doing wrong here? Many thanks!
Edit
On further experimentation, I have come to the conclusion that this is possibly a bug in HDFql. Looking at the following example:
#include <stdio.h>
#include <random>
#include <HDFql.hpp>
int main (int argc, const char * argv[]) {
HDFql::execute("CREATE TRUNCATE FILE /tmp/test_random.h5");
HDFql::execute("USE FILE /tmp/test_random.h5");
HDFql::execute("CREATE CHUNKED DATASET data/vals AS SMALLINT(0 TO UNLIMITED)");
std::stringstream ss;
std::random_device rd;
std::mt19937 eng(rd());
std::uniform_int_distribution<> dist_vals(0, 450);
std::uniform_int_distribution<> dist_len(100, 300);
int total_added = 0;
for(int i=0; i<5000; i++)
{
const int num_values = 1024; //dist_len(eng);
std::vector<uint16_t> vals;
for(int j=0; j<num_values; j++)
{
const int value = dist_vals(eng);
vals.push_back(value);
}
long long dim=0;
ss << "SHOW DIMENSION data/vals INTO MEMORY " << HDFql::variableTransientRegister(&dim);
HDFql::execute(ss.str().c_str()); ss.str("");
ss << "ALTER DIMENSION data/vals TO +" << vals.size();
HDFql::execute(ss.str().c_str()); ss.str("");
ss << "INSERT INTO data/vals(-" << vals.size() << ":1:1:" << vals.size()
<< ") VALUES FROM MEMORY "
<< HDFql::variableTransientRegister(vals.data());
HDFql::execute(ss.str().c_str()); ss.str("");
total_added += vals.size();
std::cout << i << ": "<< ss.str() << ": dim = " << dim
<< " : added = " << vals.size() << " (total="
<< total_added << ")" << std::endl;
}
HDFql::execute("CLOSE FILE");
}
This code keeps the size of the data constant at 1024 (num_values = 1024;
) and should work fine. However, if this is changed to 1025, the bug appears and is evidenced by the console outputting:
....
235: : dim = 240875 : added = 1025 (total=241900)
236: : dim = 241900 : added = 1025 (total=242925)
237: : dim = 0 : added = 1025 (total=243950)
238: : dim = 0 : added = 1025 (total=244975)
239: : dim = 0 : added = 1025 (total=246000)
....
Indicating that something breaks at iteration 470, since the dimension of the dataset is clearly not zero.
Weirdly, this does not explain why I was having this problem in the original example, since the size of the data array was capped to 500.