2

The question is quite straightforward. After some trials, here is the most efficient code I found:

//For the sake of the example, I initialize every entry as zero.
vector<float> vector1D(1024 * 768, 0); 
vector<vector<float>> vector2D(768, vector<float>(1024,0));

int counter = 0;
for (int i = 0; i < 768; i++) {
    for (int j = 0; j < 1024; j++) {
        vector2D[i][j] = vector1D[counter++];
    }
}

Is there a faster way?

John Katsantas
  • 571
  • 6
  • 20
  • 7
    A very efficient way would be to create a view that provides the interface of a 2D vector whilst still being a 1d vector. – Mansoor Oct 31 '20 at 22:53
  • 1
    Instead of the inner loop you `for (int j = 0; j < 1024; j++) {` you could try `std::copy` the compiler might be able to generate code that copies the 1024 elements more efficiently in one step. But are you sure you really want to have a `vector>`? Normally you want to work on the data stored in such a matrix, and then having that data continuesly in memory is most of the time more efficent. – t.niese Oct 31 '20 at 22:58
  • @M.A Noted. I'll look into it as I've never heard of views before. – John Katsantas Oct 31 '20 at 22:59
  • If you really need to copy the data, I think the way you are doing it is the best. If what you want is to be able to access the data conveniently as if it was a 2D array, you can make a wrapper class that overloads the `operator[]` and returns a `std::span` (if you are using c++17) or just a pointer. – tuket Oct 31 '20 at 23:04
  • @t.niese I tried it but it took longer. As for your question, I'm not sure yet. I'm doing some computationally heavy stuff afterwards. During that part I need to convert my single index (for the 1D vector) into i and j for other reasons. I was trying to avoid this conversion since I'm in a huge nested for loop and, albeit simple, this conversion time adds up to a big amount in the end. – John Katsantas Oct 31 '20 at 23:07
  • 2
    `vector2D[i][j]` is something like `vector2D.ptr_to_data[i].ptr_to_data[j]`, with memory not not necessarily being continuously in memory. This can result in cache misses and be slower then `vector1D[j+i*1024]`. Most of the libraries out there that do heavy computation on matrices store those continuously in memory. That something looks simpler like `vector2D[i][j]` compared to `vector1D[j+i*1024]` does not mean that it is more efficient. – t.niese Oct 31 '20 at 23:11
  • @t.niese Hmm you are right, I hadn't thought about this. I will definitely look into it. So, basically I need to see what's more efficient : 1) calculating j+i*1024 and accessing an element in a 1D vector vs 2) accessing an element in a 2D vector. However, before entering the nested for loop of my iteration, I could pass vector2D[i] into a new 1D vector , let's call it vectorC. That way, I'm still accessing elements in a 1D vector inside my nested for loop using vectorC[j]. – John Katsantas Nov 01 '20 at 13:24
  • @JohnKatsantas if you are able to store `vectorC` then you could do the same with `i * 1024`. – t.niese Nov 01 '20 at 14:22
  • Yes, I have done that in my solution with the 1D vector. I'm gonna try all the suggestions mentioned and see what runs faster in release mode. Thanks! – John Katsantas Nov 01 '20 at 14:30

3 Answers3

3

Yes.

You can remap the way you access the elements without needing to copy them. You can create a "view" class to achieve that:

template<typename T>
class two_dee_view
{
public:
    two_dee_view(std::vector<T>& v, std::size_t row, std::size_t col)
        : v(v), stride(col) { if(v.size() < row * col) v.resize(row * col); }

    T& operator()(std::size_t row, std::size_t col)
        { return v[(row * stride) + col]; }

    T const& operator()(std::size_t row, std::size_t col) const
        { return v[(row * stride) + col]; }

    std::size_t col_size() const { return stride; }
    std::size_t row_size() const { return v.size() / stride; }

private:
    std::vector<T>& v;
    std::size_t stride;
};

int main()
{
    std::vector<double> v {1.0, 2.0, 3.0, 4.0, 5.0, 6.0};

    two_dee_view<double> v2d(v, 2, 3);

    for(auto row = 0U; row < v2d.row_size(); ++row)
        for(auto col = 0U; col < v2d.col_size(); ++col)
            std::cout << row << ", " << col << ": " << v2d(row, col) << '\n';
}

Output:

0, 0: 1
0, 1: 2
0, 2: 3
1, 0: 4
1, 1: 5
1, 2: 6

The class simply maintains a reference to the std::vector you pass in to the constructor. You should only use the two_dee_view as long as the original std::vector lives but no longer.

Galik
  • 47,303
  • 4
  • 80
  • 117
1

It might be faster by using memcpy, as that is the lowest possible level of an API for copying memory and is likely that there are compiler optimizations which may use specific instructions, etc. and make if faster:

for (int i = 0; i < 768; i++) {
    memcpy(vector2D[i].data(), &vector1D[i * 1024], sizeof(float) * 1024);
}

Keep in mind that you shouldn't be using memcpy for anything but trivially-copiable data. That is, it will work fine for float and int but not for classes as the copy constructor will not be called.

Lyubomir Vasilev
  • 3,000
  • 17
  • 24
  • Wow, I'm just checking in debug mode but computation time with this drops from about 50ms to 2ms. That will do! – John Katsantas Oct 31 '20 at 23:13
  • 1
    @JohnKatsantas comparing performance differences in debug mode are not really helpful. Something can be faster in debug but way slower in release build. And mesuring performance in releas build with optimizations one is tricky, because you need to avoid that that the compielr otimized things complely away because it you don't do aynthign with the result. – t.niese Oct 31 '20 at 23:19
  • @t.niese I've been coding for years but in university we never actually had to optimize our code in such detail. It's the first time I'm going through details like this. It's more troublesome than I expected. Until yesterday I didn't even know that debug mode slows down my code. Now you are telling me that something can be even slower in release mode. I'm so confused right now :P – John Katsantas Nov 01 '20 at 13:28
  • @t.niese But still, a reduction from 50ms to 2ms has to be meaningful even in debug mode. I mean, since I'm trying both ways in debug mode it should be ok to at least compare the two methods even though their actual timing (50 and 2 ms) may not be true. Right? – John Katsantas Nov 01 '20 at 13:30
  • 2
    @JohnKatsantas no, something that takes 50ms in debug could only require 0.1ms in release while something that takes 2ms in debug could take 1ms in release. You can’t do any assumptions about what performs better in release with optimizations turned on, based on measurements you do in a debug build. – t.niese Nov 01 '20 at 14:28
0

If you have to use a vector of vectors for some reason, using memcpy or memmove is faster (because it's a single step, as described in another reply). But you should use the STL instead of doing it by yourself.

vector<float> vector1D(1024 * 768, 0);
vector<vector<float>> vector2D(768, vector<float>(1024, 0));

for (int i = 0; i < 768; i++) {
  vector2D[i].assign(next(vector1D.cbegin(), 1024 * i),
                     next(vector1D.cbegin(), 1024 * (i + 1)));
}

This results in a straight memmove (depending on the STL implementation) but is much more safe, optimized and (possibly) readable.

local-ninja
  • 1,198
  • 4
  • 11