Is it possible to add/subtract arrays read from a data file?

Question

I want to perform some mathematical operations such as addition/subtraction, mean etc. on the data read from the file. The data is two dimensional, looks something like this: So now for example I want to subtract first row from the 3rd or 4th, it is very simple in python using indexing, but I have no idea how to do in c++. Here is the code I'm using.

#include <boost/multi_array.hpp>
#include <boost/timer/timer.hpp>
#include <boost/range/irange.hpp>
#include <h5xx/h5xx.hpp>
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <string>

using array_2d_t = boost::multi_array<float, 2>;

template <typename T> 
void print_array(T const& array)
{
    for (auto const& row : array) 
        { for (auto v : row)
            printf("%10f ", v);
        printf("\n"); //prints a new line similar t0 \n
    }
    std::cout << "\n End of file" << std::endl;
}

array_2d_t read_frame(std::string const& filename, unsigned frame_no) {
    //h5xx::file xaa("../../data/xaa.h5", h5xx::file::mode::in);
    h5xx::file xaa(filename, h5xx::file::mode::in);

    h5xx::group   g(xaa, "particles/lipids/box/positions");
    h5xx::dataset ds(g, "value");

    // determine dataset shape: frames, particle count, space dimension
    auto ds_shape = h5xx::dataspace(ds).extents<3>();
    array_2d_t arr(boost::extents[ds_shape[1]][ds_shape[2]]);

    std::vector<hsize_t> offsets{frame_no, 0, 0};
    std::vector<hsize_t> counts{1, arr.shape()[0], arr.shape()[1]};
    //std::vector<hsize_t> strid{1, arr.shape()[0], arr.shape()[1]};
    // slice ohne stride:
    h5xx::slice slice(offsets, counts);
    h5xx::read_dataset(ds, arr, slice);
    return arr;
}

int main(int argc, char const* argv[])
{
    if (argc < 2) {
        std::cout << "Usage: " << argv[0] << " input.h5" << std::endl;
        return -1;
    }
    std::string filename(argv[1]);
    print_array(read_frame(filename, 1));
    return 0; 
}

read_frame function takes the argument filename and frame_no. The file is divided into frames each containing (11214) rows, but it is not important to know here. So read_frame returns the two dimensional data for every frame. In the main() function I can print the data for every frame, but now suppose I want to subtract the first frame from the second, how can I do that? Does anyone know ho

Can you show the Python? That will make it clear /what/ you want to achieve, not how you think it should have worked — sehe, Oct 27 '21 at 22:09
@sehe to be precise I want to take the average of first 4 frames and then subtract that from the fifth frame. But first I just want to learn about the basic operations on these frames. — Mahesh, Oct 27 '21 at 22:18
That's ... not what I asked for. But you know, I'll add in a few — sehe, Oct 27 '21 at 22:22

score 1 · Accepted Answer · answered Oct 27 '21 at 22:19

So, what the data looks like in text form is very irrelevant. What's relevant is that you have to arrays of the same dimensions and want to apply element-wise arithmetic operations to it.

You can use a proper Matrix library (like Boost UBlas, Eigen, Armadillo etc.).

Or you can write them for Boost MultiArray: How to execute mathematical operations between two boost::multi_arrays (C++)?

Here's a sample:

#include <algorithm>
#include <boost/multi_array.hpp>
#include <boost/range/irange.hpp>
#include <boost/timer/timer.hpp>
#include <h5xx/h5xx.hpp>
#include <iostream>
#include <iterator>
#include <string>
#include <vector>

namespace ArrayOperators {
    template <typename Op> struct ArrayOp {
        Op op;
        constexpr explicit ArrayOp(Op op) : op(op) {}

        template <typename T, typename Scalar, size_t Dim> 
        constexpr inline auto operator()(
            boost::multi_array<T, Dim> const& l,
            Scalar const& v) const
        {
            std::array<int, Dim> shape;
            std::copy_n(l.shape(), Dim, shape.data());

            using R = boost::multi_array<decltype(op(T{}, v)), Dim>;
            R result(shape);
            
            std::transform(
               l.data(), l.data()+l.num_elements(),
               result.data(),
               [&op=op,v](auto const& el) { return op(el, v); });

            return result;
        }

        template <typename T, typename U, size_t Dim> 
        constexpr inline auto operator()(
            boost::multi_array<T, Dim> const& l,
            boost::multi_array<U, Dim> const& r) const
        {
            std::array<int, Dim> shape;
            std::copy_n(l.shape(), Dim, shape.data());
            assert(std::equal(shape.begin(), shape.end(), r.shape()));

            using R = boost::multi_array<decltype(op(T{}, U{})), Dim>;
            R result(shape);
            
            std::transform(
               l.data(), l.data()+l.num_elements(),
               r.data(), result.data(),
               [&op=op](auto const& v1, auto const& v2) { return op(v1, v2); });

            return result;
        }
    };

    template <typename L, typename R>
    static constexpr inline auto operator+(L const& l, R const& r) {
        return ArrayOp {std::plus<>{}} (l, r); }

    template <typename L, typename R>
    static constexpr inline auto operator-(L const& l, R const& r) {
        return ArrayOp {std::minus<>{}} (l, r); }

    template <typename L, typename R>
    static constexpr inline auto operator/(L const& l, R const& r) {
        return ArrayOp {std::divides<>{}} (l, r); }

    template <typename L, typename R>
    static constexpr inline auto operator*(L const& l, R const& r) {
        return ArrayOp {std::multiplies<>{}} (l, r); }
}

using array_2d_t = boost::multi_array<float, 2>;

template <typename T> 
void print_array(T const& array)
{
    for (auto const& row : array) 
        { for (auto v : row)
            printf("%10f ", v);
        printf("\n"); //prints a new line similar t0 \n
    }
    std::cout << "\n End of file" << std::endl;
}

array_2d_t read_frame(std::string const& filename, unsigned frame_no) {
    //h5xx::file xaa("../../data/xaa.h5", h5xx::file::mode::in);
    h5xx::file xaa(filename, h5xx::file::mode::in);

    h5xx::group   g(xaa, "particles/lipids/box/positions");
    h5xx::dataset ds(g, "value");

    // determine dataset shape: frames, particle count, space dimension
    auto ds_shape = h5xx::dataspace(ds).extents<3>();
    array_2d_t arr(boost::extents[ds_shape[1]][ds_shape[2]]);

    std::vector<hsize_t> offsets{frame_no, 0, 0};
    std::vector<hsize_t> counts{1, arr.shape()[0], arr.shape()[1]};
    //std::vector<hsize_t> strid{1, arr.shape()[0], arr.shape()[1]};
    // slice ohne stride:
    h5xx::slice slice(offsets, counts);
    h5xx::read_dataset(ds, arr, slice);
    return arr;
}

int main(int argc, char const* argv[])
{
    if (argc < 2) {
        std::cout << "Usage: " << argv[0] << " input.h5" << std::endl;
        return -1;
    }
    std::string filename(argv[1]);

    using namespace ArrayOperators;
    auto avg4 = //
        (read_frame(filename, 0) + read_frame(filename, 1) +
         read_frame(filename, 2) + read_frame(filename, 3)) /
        4.0;

    print_array(read_frame(filename, 4) - avg4);
    return 0; 
}

Prints:

 -0.610001   0.410004  -0.150000 
  0.372501  -0.395000  -0.037500 
 -0.062500   0.027500  -0.122500 
  0.307503   0.184998  -0.212500 
  0.150000   0.009995   0.362500 
 -0.497500   0.197500   0.217500 
  0.405003  -0.185000   0.115000 
 -0.072500   0.247498   0.237500 
 -0.480000   0.034998   0.107500 
 -0.130000  -0.162499  -0.087500 
(...)
 -0.340000  -0.280001  -0.040000 
 -0.572502  -0.747505  -0.005000 
  0.392494   0.012500   0.045000 
  0.500000  -0.040000  -0.162500 
  0.355000   0.700000   0.065000 

 End of file

I'll have you know that by now it becomes probably mandatory to step back and consider the *task* before the random approaches. I suppose none of this is more efficient in C++ than in Python. Also, I suppose this type of number crunching is very typical in some fields (image manipulation, detection, ML). Try asking about the field/application rather than coming to the table with a set of (potentially ill-chosen) tools? — sehe, Oct 27 '21 at 22:26
Thanks! But in case incase the frame_no is close to millions, could it be possible to use for loops ? — Mahesh, Oct 27 '21 at 22:30
Last time I checked C++ has for loops. So, yes. Again, consider using a proper library. Numpy is probably 100x better than this. — sehe, Oct 27 '21 at 22:32
I tried reading frames the way you suggested in your answer, but it is not working! **aka 'boost::multi_array'} is not derived from 'const __gnu_cxx::__normal_iterator<_Iterator, _Container>'**. How it works on your system? — Mahesh, Oct 27 '21 at 22:40
Excellently :) The code is in the answer. I compile using GCC 10, Boost 1.77, but I'm pretty sure it all doesn't really matter. Consider posting the actual code and error. (coliru.com, godbolt.org, wandbox.org, etc Of course none has h5xx) — sehe, Oct 27 '21 at 22:47
there are many errors but one I undestand is **error: invalid operands of types 'void' and 'double' to binary 'operator/'**. Cannot perform mathematical operations on functions! — Mahesh, Oct 27 '21 at 22:50
https://pastebin.ubuntu.com/p/QkgNTGBx4F/here's a sketch with a loop over all the frames in the file. Note: this is atrocious code in terms of efficiency. — sehe, Oct 27 '21 at 23:25
Oh I see. It just got mashed with `here's`. The first link also still works https://pastebin.ubuntu.com/p/QkgNTGBx4F/ — sehe, Oct 28 '21 at 00:05
Hey!! I've a question regarding boost multi_array. If you could help **https://stackoverflow.com/questions/69780791/how-to-store-values-in-the-boost-multi-array-container** — Mahesh, Oct 30 '21 at 18:03

Is it possible to add/subtract arrays read from a data file?

1 Answers1