Removing potentially redundant template parameters when loading a binary file into Eigen matrix

Question

I want to read a plain binary file containing a number of unsigned 16-bit integers into an Eigen matrix, and I wrote a templated utility to do this. This is what the caller looks like:

Matrix<uint16_t, Dynamic, Dynamic> data;
int retval = read_data<Matrix<uint16_t, Dynamic, Dynamic>, uint16_t>(
    argv[1], data);

And here's what read_data looks like:

template <typename Derived, typename Scalar> // Per @Jarod42, get rid of Scalar here (*)
int read_data(const char* const fname, MatrixBase<Derived>& data) {
    // (*) If we don't have Scalar as a template, just uncomment this:
    // typedef typename Derived::Scalar Scalar;

    ifstream fin(fname, ios::binary);
    if (!fin) {
        return 2;
    }

    fin.seekg(0, fin.end);
    long long bytes = fin.tellg();
    if (bytes % sizeof(Scalar) != 0) {
        // The available number of bytes won't fill an even number of Scalar
        // values
        return 3;
    }
    long long nscalars = bytes / sizeof(Scalar);

    // See http://forum.kde.org/viewtopic.php?f=74&t=107551
    MatrixBase<Derived>& data_edit = const_cast<MatrixBase<Derived>&>(data);
    data_edit.derived().resize(nscalars, 1);

    Scalar* buffer = new Scalar[nscalars]; // Switched to vector per @Casey

    fin.seekg(0, fin.beg);
    fin.read(reinterpret_cast<char*>(buffer), bytes);
    if (!fin) {
        // All data not read. fin.gcount() will indicate bytes read.
        return 4;
    }

    for (long long idx = 0; idx < nscalars; ++idx) {
        data_edit(idx) = buffer[idx];
    }

    return 0;
}

In brief,

the file is opened,
its size is obtained,
an array is dynamically allocated to store all the data,
the file is read into the array,
the array's contents are copied into the matrix.

This is reasonable and it works (though I'm open to suggestions for improvements), but I think the function has one too many template parameters, and the function call in the caller is just too verbose. I think there should be a way to eliminate the second template parameter, which only serves to tell read_data the number of bytes per scalar (2 in the case of uint16_t), and which I believe should be inferrable using the first template parameter.

Questions Is there a way to eliminate the seemingly redundant second template parameter to read_data?

Also, is my approach of passing in a matrix reference only to resize it in the read_data function (using the verbose and confusing idiom of creating a modifiable reference to the matrix in order to resize it via derived()) the right way to proceed? I realize this dynamically allocates memory, which is fine, but I think it is not doing anything wasteful---correct?.

(Discussion Is there other improvements to this code one would like to see? I'm a C or Python numerical coder; in C, I'd just deal with void* arrays and pass an extra function argument telling the function the size of each scalar; with Python I'd just do numpy.fromfile('path/to/file.bin', dtype=numpy.uint16) and be done with it. But I'd like to do it right by Eigen and C++.)

NB. I use matrixes instead of vectors because I'll be resizing them into rectangular matrixes later.

NB2. In Fixed Sized Eigen types as parameters the notion of templating the function using the scalar type is promoted. I am not averse to this approach, I chose to pass read_data a matrix reference instead of making it return a Matrix object because I wanted integer return values indicating errors---though now I realize I ought to make those exceptions.

NB3. In c++ check for nested typedef of a template parameter to get its scalar base type an elaborate set of helper classes is used to, I think, achieve a similar effect for templated classes. I'm curious if a simpler approach can be used here, for a templated function.

NB4. A simple improvement that I'm aware of is typedeffing Matrix<uint16_t, Dynamic, Dynamic> to reduce verbosity.

`Scalar* buffer = new Scalar[nscalars];` is leaking - use `std::vector buffer(nscalars);` ... `fin.read(reinterpret_cast(buffer.data()), bytes);` — Casey, Jan 18 '14 at 05:23
What is the point of `MatrixBase& data_edit = const_cast&>(data);`? `data` is already a non-`const` reference. — Casey, Jan 18 '14 at 05:27
@Casey, doh, thanks. Is there an advantage to using `vector` over just `delete(buffer)` in this situation? — Ahmed Fasih, Jan 18 '14 at 05:28
Yes, it won't leak when you return early. Or throw an exception. Or call some other code that throws an exception. — Casey, Jan 18 '14 at 05:28
@Casey, I am not really sure why I need that step---shows both my limited knowledge of C++ and eigen---but the URL in the comment above it gave me that incantation and it's needed to be able to call `resize` (actually an allocation). I can't call `resize` on data directly for some reason, assertions are unmet. — Ahmed Fasih, Jan 18 '14 at 05:29

score 2 · Accepted Answer · answered Jan 18 '14 at 13:21

I think that Matrix<uint16_t, Dynamic, Dynamic> should have a typedef to retrieve uint16_t. If so, you may write as first line (assuming it is type):

typedef typename Derived::type Scalar;

If there is no typedef, you may write a traits for that:

template <typename> struct matrix_type;

template <typename T, typename U1, typename U2>
struct matrix_type<Matrix<T, U1, U2> >
{
    typedef T type;
    typedef U1 type1;
    typedef U2 type2;
};

And then write

typedef typename matrix_type<Derived>::type Scalar;

Note: If you have written the parameters in the other order

template <typename Derived, typename Scalar>
int read_data(const char* const fname, MatrixBase<Derived>& data);

you may write (as the last parameter may be deduced):

Matrix<uint16_t, Dynamic, Dynamic> data;
int retval = read_data<uint16_t>(argv[1], data);

And now that the parameter is obsolete, you may write directly:

Matrix<uint16_t, Dynamic, Dynamic> data;
int retval = read_data(argv[1], data);

Bang on. The typedef to get the base type for a `MatrixBase` is `Scalar`: http://eigen.tuxfamily.org/dox/MatrixBase_8h_source.html so `read_data` can have just one template parameter and inside, it can have `typedef typename Derived::Scalar Scalar;`. — Ahmed Fasih, Jan 18 '14 at 15:55
Thanks for explaining how one can use traits for this problem too, I'll read up on this. — Ahmed Fasih, Jan 18 '14 at 16:12

score 0 · Answer 2 · answered Jan 18 '14 at 06:01

0

You may want to look at map in eigen. It takes a chuck a memory directly.

TutorialMapClass.html">http://eigen.tuxfamily.org/dox/group_TutorialMapClass.html

To construct a Map variable, you need two other pieces of information: a pointer to the region of memory defining the array of coefficients, and the desired shape of the matrix or vector. For example, to define a matrix of float with sizes determined at compile time, you might do the following:

Map mf(pf,rows,columns);

answered Jan 18 '14 at 06:01

user3195397

226
1
2

Because Map uses the original memory containing the data, and I was going to do a few potentially in-place manipulation of the data, it wasn't immediately clear if I should use that over a proper Matrix object with dynamically-allocated memory of its own. It might make sense to nail down what exactly I want to do with this data and then evaluate whether Map will suffice; perhaps Matrix gives me the most freedom in the exploratory programming period. – Ahmed Fasih Jan 18 '14 at 06:06
What I like Eigen is that it takes care of memory allocation all by itself. You can still do in-place data manu, throwing away the whole map if data is not expected. One thing you may look out for is big and small endian. If chuck of data is generated in big endian machine, whild you read in small endian, then it will fail. – user3195397 Jan 18 '14 at 06:11

Removing potentially redundant template parameters when loading a binary file into Eigen matrix

2 Answers2