5

I want to write a 2D vector of floats to a HDF5 file. I used the following code (writeh5.cpp):

#include <cstdlib> 
#include <ctime> 
#include <iostream>
#include <string>
#include <vector>
#include <iterator>
#include <H5Cpp.h>

using namespace H5;
using namespace std;

int main(void) {
  int nrow = 5;
  int ncol = 4;

  vector<vector< double > > vec2d;
  vec2d.resize(nrow, vector<double>(ncol, 0.0));

  srand((unsigned)time(0));

  typename vector< vector< double > >::iterator row;
  typename vector< double >::iterator col;
  for (row = vec2d.begin(); row != vec2d.end(); row++) {
    cout << endl;
    for (col = row->begin(); col != row->end(); col++) {

      *col = (rand()/(RAND_MAX+1.0));
      cout << *col << '\t';
    }
  }
  cout << endl;

  H5File file("test.h5", H5F_ACC_TRUNC);

  // dataset dimensions
  hsize_t dimsf[2];
  dimsf[0] = nrow;
  dimsf[1] = ncol;
  DataSpace dataspace(2, dimsf);

  DataType datatype(H5::PredType::NATIVE_DOUBLE);
  DataSet dataset = file.createDataSet("data", datatype, dataspace);

  // dataset.write(vec2d.data(), H5::PredType::NATIVE_DOUBLE);
  dataset.write(&vec2d[0][0], H5::PredType::NATIVE_DOUBLE);

  cout << endl << " vec2d has " << endl;
  for (row = vec2d.begin(); row != vec2d.end(); row++) {
      cout << endl;
      for (col = row->begin(); col != row->end(); col++) {            

        cout << *col << '\t';
      }
  }
  cout << endl;

  dataset.close();
  dataspace.close();
  file.close();

  return 0;
}

I compiled it using g++ writeh5.cpp -I/usr/include/hdf5/ -lhdf5_cpp -lhdf5 -Wall

A run of the code produced the following output:

0.325553        0.598941        0.364489        0.0125061
0.374205        0.0319419       0.380329        0.815621
0.863754        0.386279        0.0173515       0.15448
0.703936        0.372486        0.728436        0.991631
0.666207        0.568983        0.807475        0.964276

And the file test.h5

Then when i read this file from python (using the following)

import h5py
import numpy as np

file = h5py.File("test.h5", 'r')
dataset = np.array(file["data"])

print dataset

file.close()

I got

 [[  3.25553381e-001   5.98941262e-001   3.64488814e-001   1.25061036e-002]
 [  0.00000000e+000   2.42092166e-322   3.74204732e-001   3.19418786e-002]
 [  3.80329057e-001   8.15620518e-001   0.00000000e+000   2.42092166e-322]
 [  8.63753530e-001   3.86278684e-001   1.73514970e-002   1.54479635e-001]
 [  0.00000000e+000   2.42092166e-322   7.03935940e-001   3.72486182e-001]]

the first row is good, the other rows are garbage.

I tried with dataset.write(&vec2d[0]... and dataset.write(vec2d[0].data()..., i got similar problems.

I want to

  1. Write a HDF5 file with the contents of a 2D std::vector of doubles,
  2. Read the file in python and store the contents in a numpy array

What i am doing wrong?

Caos21
  • 168
  • 1
  • 12
  • The python code is OK so far. You could also write `dataset = file["data"][:]` as that will dump the HDF5 dataset into the variable `dataset` as a numpy array (don't need to *cast* to a numpy array). – Imanol Luengo Sep 08 '15 at 13:44

3 Answers3

4

Apparently, I am not allowed to pass a std::vector of vectors to the write function. Thus, copying the elements of the vector to an static array solves the problem, because the write function accepts happily this array.

However, I am not happy with this solution, I expected to use the vectors directly into the write function.

Here is the code:

#include <cstdlib> 
#include <ctime> 
#include <iostream>
#include <string>
#include <vector>
#include <iterator>
#include <H5Cpp.h>

using namespace H5;
using namespace std;

int main(void) {
  int nrow = 5;
  int ncol = 4;

  vector<vector< double > > vec2d;
  vec2d.resize(nrow, vector<double>(ncol, 0.0));

  srand((unsigned)time(0));

  // generate some data
  typename vector< vector< double > >::iterator row;
  typename vector< double >::iterator col;
  for (row = vec2d.begin(); row != vec2d.end(); row++) {
    cout << endl;
    for (col = row->begin(); col != row->end(); col++) {            
        *col = (rand()/(RAND_MAX+1.0));
        cout << *col << '\t';
    }
  }
  cout << endl;

  double varray[nrow][ncol];
  for( int i = 0; i<nrow; ++i) {
    cout << endl;
    for( int j = 0; j<ncol; ++j) {
        varray[i][j] = vec2d[i][j];
    }
  }

  H5File file("test.h5", H5F_ACC_TRUNC);

  // dataset dimensions
  hsize_t dimsf[2];
  dimsf[0] = nrow;
  dimsf[1] = ncol;
  DataSpace dataspace(2, dimsf);

  DataType datatype(H5::PredType::NATIVE_DOUBLE);
  DataSet dataset = file.createDataSet("data", datatype, dataspace);

  dataset.write(varray, H5::PredType::NATIVE_DOUBLE);


  cout << endl;

 dataset.close();
 dataspace.close();
 file.close();
 return 0;
}
Guillaume Jacquenot
  • 11,217
  • 6
  • 43
  • 49
Caos21
  • 168
  • 1
  • 12
2

I ran into the same problem when i converted my data from a vector to a dynamic 2D array. The problem with the h5write command is not that it will not accept a vector, It does not understand the concept of a pointer array. it only writes out contiguous memory. A vector of vectors is not contiguous in memory but instead a pointer array to a bunch of vectors. That is why when you passed the first element of the array the first row was correct. The rest of the table is just the garbage in memory following the first vector.

My solution was creating a giant 1D vector and performing my own indexing to convert back and forth. This is similar to the approach in h5_writedyn.c https://www.hdfgroup.org/ftp/HDF5/examples/misc-examples/h5_writedyn.c

crazywill32
  • 390
  • 1
  • 4
  • 12
1

What is this?

gives

0.325553        0.598941        0.364489        0.0125061
0.374205        0.0319419       0.380329        0.815621
0.863754        0.386279        0.0173515       0.15448
0.703936        0.372486        0.728436        0.991631
0.666207        0.568983        0.807475        0.964276

I don't see a print your c++ code. Did you read the file with some other tool?

(yes, this is a clarifying question, but it requires too much formatting to fit in a comment).


https://stackoverflow.com/a/24622720/901925 Writing 2-D array int[n][m] to HDF5 file using Visual C++

The solution talks about writing a vector of vectors. It also talks about writing variable length arrays.

You may have to put in the dataset write in a row iterator

for (row = vec2d.begin(); row != vec2d.end(); row++) {
      dataset.write(*row, H5::PredType::NATIVE_DOUBLE);
      # or dataset.write(row[0], ...)?
      }
  }
Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • this is one of the results by running the C++ code I posted above. I tried to generate some random data and store it in a hdf5 file. It's only to show that i do not recover in python the data i wrote using my C++ code. I guess the error is in the C++ snippet. – Caos21 Sep 08 '15 at 02:21
  • Do you need some sort of `flush` and `close` in the c++? – hpaulj Sep 08 '15 at 02:48
  • Sorry, I missed `cout << *col << '\t'`. I've been working in Python so long that I forgot about c++ syntax. So you are showing the data that you will write. But it would be good to see a non-python display of the h5 file. – hpaulj Sep 08 '15 at 02:56
  • Thanks i added file.close(). But i have the same problem. – Caos21 Sep 08 '15 at 03:01
  • The h5 file is generated by the c++ code. I am pretty sure that vec2d has the intended values but it seems to me that the problem is in writing the h5 file. – Caos21 Sep 08 '15 at 03:06
  • I found another SO question that might apply. – hpaulj Sep 08 '15 at 07:40