I have a std::vector<int>
with duplicate values. I can find the unique values using std::unique()
and std::vector::erase()
, but how can I efficiently find the vector of indices and construct the original vector given the vector of unique values, through an inverse mapping vector. Allow me to illustrate this using an example:
std::vector<int> vec = {3, 2, 3, 3, 6, 5, 5, 6, 2, 6};
std::vector<int> uvec = {3, 2, 6, 5}; // vector of unique values
std::vector<int> idx_vec = {0, 1, 4, 5}; // vector of indices
std::vector<int> inv_vec = {0, 1, 0, 0, 2, 3, 3, 2, 1, 2}; // inverse mapping
The inverse mapping vector is such that with its indices one can construct the original vector using the unique vector i.e.
std::vector<int> orig_vec(ivec.size()); // construct the original vector
std::for_each(ivec.begin(), ivec.end(),
[&uvec,&inv_vec,&orig_vec](int idx) {orig_vec[idx] = uvec[inv_vec[idx]];});
And the indices vector is simply a vector indices of first occurrence of unique values in the original vector.
My rudimentary solution is far from efficient. It does not use STL algorithms and is O(n^2)
at worst.
template <typename T>
inline std::tuple<std::vector<T>,std::vector<int>,vector<int>>
unique_idx_inv(const std::vector<T> &a) {
auto size_a = size(a);
std::vector<T> uniques;
std::vector<int> idx; // vector of indices
vector<int> inv(size_a); // vector of inverse mapping
for (auto i=0; i<size_a; ++i) {
auto counter = 0;
for (auto j=0; j<uniques.size(); ++j) {
if (uniques[j]==a[i]) {
counter +=1;
break;
}
}
if (counter==0) {
uniques.push_back(a[i]);
idx.push_back(i);
}
}
for (auto i=0; i<size_a; ++i) {
for (auto j=0; j<uniques.size(); ++j) {
if (uniques[j]==a[i]) {
inv[i] = j;
break;
}
}
}
return std::make_tuple(uniques,idx,inv);
}
Comparing this with the typical std::sort+std::erase+std::unique approach (which by the way only computes unique values and not indices or inverse), I get the following timing on my laptop with g++ -O3
[for a vector of size=10000
with only one duplicate value]
Find uniques+indices+inverse: 145ms
Find only uniques using STL's sort+erase+unique 0.48ms
Of course the two approaches are not exactly identical, as the latter one sorts the indices, but still I believe the solution I have posted above can be optimised considerably. Any thoughts how on I can achieve this?