0

I've been skimming through the Armadillo documentation and examples, but it seems there is no real efficient way to subsample (or resample) a large vector or matrix, such that if you had N elements originally, you end up with N / k elements. There are a few methods to shuffle and shift but that's about it.

So I'm just looping over all elements sequentially, but surely there has to be a better way besides vectorizing over the available cores?

bool subsample(config& cfg, arma::mat& data, int skippCount)
{
    const auto processor_count = 1; // currently not using threading because 'inplace'

    const size_t cols = data.n_cols;
    const size_t period = skippCount + 1 ;
    size_t newCols = cols / period;
    newCols += (0 == (cols % period)) ? 0 : 1;
       
    const size_t blockSize = 256;
    std::vector<thread> workers;

    for (size_t blockID = 0; blockID < newCols / blockSize; ++blockID) {
        workers.push_back(std::thread([&data, blockID, newCols, period]() { 
            // copy blockSize elements inplace (overwrites other entries))
            size_t c = blockID * blockSize;
            for (size_t b = 0; (c < newCols) && (b < blockSize); c++, b++) {
                arma::vec v = data.col(period * c); 
                data.col(c) = v;
            }
        }));

        if (workers.size()==processor_count) {
            for (auto& thread : workers) thread.join();
            workers.clear();
        }
    }
    for (auto& thread : workers) thread.join(); // make sure all threads finish
    data.resize(data.n_rows, newCols);
    return true;
}

If you have any suggestions to improve on this, it would be greatly appreciated. Also it would be nice to do this 'inplace' to save on memory.

StarShine
  • 1,940
  • 1
  • 27
  • 45
  • 2
    "subsample" in what sense? To keep a subset of columns or rows, use [submatrices](http://arma.sourceforge.net/docs.html#submat) with the form `X.cols(vector_of_column_indices)` or `X.submat(vector_of_row_indices, vector_of_column_indices)`. For interpolation (which could be used for data reduction), there is [interp1()](http://arma.sourceforge.net/docs.html#interp1). To help with the signal processing operation of downsampling, try [conv()](http://arma.sourceforge.net/docs.html#conv), [conv2()](http://arma.sourceforge.net/docs.html#conv2) and [fft()](http://arma.sourceforge.net/docs.html#fft). – mtall Jan 18 '21 at 02:03
  • Oh wow I missed those variants. It's not really inplace operations but it seems at least close to what I was looking for, thanks. If you put it in an anwer I'll accept. – StarShine Jan 18 '21 at 08:17
  • @mtall do you know if .cols() or .submat() are executing on NV hardware (GPU) if linking with nvblas? – StarShine Jan 20 '21 at 08:10

0 Answers0