In my project I've to copy a lot of numerical data in an std::valarray (or std::vector) from a CUDA (GPU) device (from the memory of the video-card to std::valarray).
So I need to resize these data-structures as faster as possible but when I call the member method vector::resize it initialize all elements of the array to the default value, with a loop.
// In a super simplified description resize behave like this pseudocode:
vector<T>::resize(N){
// Setup the new size
// allocate the new array
this->_internal_vector = new T[N];
// init to default
// This loop is slow !!!!
for ( i = 0; i < N ; ++i){
this->_internal_vector[i] = T();
}
}
Clearly I don't need this initialization because I've to copy data from the GPU and all old data are overwritten. And the initialization require some time; so I've a loss of performance.
For coping the data I need allocated memory; generated by the method resize().
I very dirty and wrong solution is to use the method vector::reserve(), but I lost all the features of the vector; and if I resize the data are replaced with the default value.
So, if you know, there exists a strategy for avoiding this pre-initialization to the default value (in valarray or vector).
I want a method resize that behave like this:
vector<T>::resize(N) {
// Allocate the memory.
this->_internal_vector = new T[N];
// Update the the size of the vector or valarray
// !! DO NOT initialize the new values.
}
An example of the performances:
#include <chrono>
#include <iostream>
#include <valarray>
#include <vector>
int main() {
std::vector<double> vec;
std::valarray<double> vec2;
double *vec_raw;
unsigned int N = 100000000;
std::clock_t start;
double duration;
start = std::clock();
// Dirty solution!
vec.reserve(N);
duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
std::cout << "duration reserve: " << duration << std::endl;
start = std::clock();
vec_raw = new double[N];
duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
std::cout << "duration new: " << duration << std::endl;
start = std::clock();
for (unsigned int i = 0; i < N; ++i) {
vec_raw[i] = 0;
}
duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
std::cout << "duration raw init: " << duration << std::endl;
start = std::clock();
// Dirty solution
for (unsigned int i = 0; i < vec.capacity(); ++i) {
vec[i] = 0;
}
duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
std::cout << "duration vec init dirty: " << duration << std::endl;
start = std::clock();
vec2.resize(N);
duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
std::cout << "duration valarray resize: " << duration << std::endl;
return 0;
}
Output:
duration reserve: 1.1e-05
duration new: 1e-05
duration raw init: 0.222263
duration vec init dirty: 0.214459
duration valarray resize: 0.215735
Note: replacing the std::allocator does not work because the loop is called by the resize().