2

I am a newbie to algorithms and I am trying to figure out the best possible (in terms of memory efficiency and speed) way to find out if a vector of ints (sample vector) exists in a vector of vector of ints (population vector). I will illustrate the problem using an example.

A={1,2,3,4,5,6,7,8} , These are the vertices of a cube. The six faces that can be formed from it are {1,2,3,4}, {5,6,7,8}, {1,2,6,5}, {2,3,7,6}, {3,4,8,7} {4,1,8,5}

now B={3,4,8,7}. So I have to find if B exists in how many A vectors of the population vector? ( the population vector is made of several As.)

I am using a hash function, comparing its value for B and the 6 vectors of A and running a loop for all vectors of population vector. Is there a better way to do it?

shekhar
  • 23
  • 4
  • You say that "the lengths of vector in population vector and the sample vector are not the same", then in your example they have the same length. How do you obtain the 6 4-dimensional vectors from an 8-dimensional one? It's better to show us the code you are using. – Costantino Grana Jun 15 '18 at 11:03
  • So, I have vector of As (8D) obtained from 'gmsh'-open source meshing software. Given 8 points the faces (6 4D vectors) are formed by connecting the points in a certain known pattern. Thats how I get the 6 vectors from A. SO essentially my task boils down to find if B exists in any of the 6 vectors of A. – shekhar Jun 15 '18 at 11:56

1 Answers1

2

Find, in vectors, is linear.

The overload of == for 2 vector<int> works using std::distance as explained here.

Therefore just use the idiomatic std::find

//vvint is the vector of vector of ints, vint is the vector of ints to be found
auto it = find (vvint.begin(), vvint.end(), vint);
if (it != vvint.end())
  //found
else
  //not found

Hash functions do not work because they may generate collisions, not guaranteeing the validity of a find, unless you double check equality. This means that 2 or more vectors of 4 ints, as in your example, could have the same hash. So you may find a correspondence by hash but the actual vector differed. (You could however create a hash function that univoquely maps a subdomain of your vector of 4 ints into unique hashes for every element of the subdomain.) Anyway this requires an additional unordered_map increasing space complexity, as well as adding complexity to insertions and deletions.

Binary search is possible when you define an efficient comparer and keep the vector sorted but is not feasible for vectors. In order to do it, overload <, then finds will become logarithmic, but insertion and deletion require a find each, to find the position where to add or remove, and then also possibly trigger resizing or shifting of multiple elements, worsening performance.

Also the point of vectors is they are not sorted.

That is why other structures exist, such as set

Attersson
  • 4,755
  • 1
  • 15
  • 29
  • @Atterson it can (often) be more efficient to populate vectors, sort them, and then perform an operation on them that requires sorted lists, instead of using `std::set` through and through. That is why `std::sort` exists. – rubenvb Jun 15 '18 at 13:15
  • I agree. Anyway the OP question asks for an efficient find, which is O(N). Sorting, O(NLogN), before a find, now only O(LogN), is actually worse for a single find. And keeping vectors sorted is not feasible due to resizing and shifting overhead. – Attersson Jun 15 '18 at 13:20
  • Thank you @Attersson. The method works faster than my original method (which was comparing element by element) and is more than reliable than hashing. – shekhar Jun 19 '18 at 17:53
  • Nice, I am glad to have helped. – Attersson Jun 19 '18 at 20:04