3

In a "self-avoiding random walk" situation, I have a 2-dimensional vector with a configuration of step-coordinates. I want to be able to check if a certain site has been occupied, but the problem is that the axis can be zero, so checking if the fabs() of the coordinate is true (or that it has a value), won't work. Therefore, I've considered looping through the steps and checking if my coordinate equals another coordinate on all axis, and if it does, stepping back and trying again (a so-called depth-first approach).

Is there a more efficient way to do this? I've seen someone use a boolean array with all possible coordinates, like so:

bool occupied[nMax][nMax];          // true if lattice site is occupied
for (int y = -rMax; y <= rMax; y++)
        for (int x = -rMax; x <= rMax; x++)
            occupied[index(y)][index(x)] = false;

But, in my program the number of dimensions is unknown, so would an approach such as:

typedef std::vector<std::vector<long int>> WalkVec;
WalkVec walk(1, std::vector<long int>(dof,0));
siteVisited = false; counter = 0;  
            while (counter < (walkVec.back().size()-1))
            {
                tdof = 1;
                while (tdof <= dimensions)
                {

                    if (walkHist.back().at(tdof-1) == walkHist.at(counter).at(tdof-1) || walkHist.back().at(tdof-1) == 0)
                    {
                        siteVisited = true;
                    }
                    else
                    {
                        siteVisited = false;
                        break;
                    }
                    tdof++;
                }

work where dof if the number of dimensions. (the check for zero checks if the position is the origin. Three zero coordinates, or three visited coordinates on the same step is the only way to make it true) Is there a more efficient way of doing it?

NictraSavios
  • 502
  • 2
  • 9
  • 29

2 Answers2

3

You can do this check in O(log n) or O(1) time using STL's set or unordered_set respectively. The unordered_set container requires you to write a custom hash function for your coordinates, while the set container only needs you to provide a comparison function. The set implementation is particularly easy, and logarithmic time should be fast enough:

#include <iostream>
#include <set>
#include <vector>
#include <cassert>

class Position {
public:
    Position(const std::vector<long int> &c)
        : m_coords(c) { }

    size_t dim() const { return m_coords.size(); }
    bool operator <(const Position &b) const {
        assert(b.dim() == dim());
        for (size_t i = 0; i < dim(); ++i) {
            if (m_coords[i] < b.m_coords[i])
                return true;
            if (m_coords[i] > b.m_coords[i])
                return false;
        }
        return false;
    }

private:
    std::vector<long int> m_coords;
};

int main(int argc, const char *argv[])
{
    std::set<Position> visited;
    std::vector<long int> coords(3, 0);
    visited.insert(Position(coords));

    while (true) {
        std::cout << "x, y, z: ";
        std::cin >> coords[0] >> coords[1] >> coords[2];
        Position candidate(coords);
        if (visited.find(candidate) != visited.end())
            std::cout << "Aready visited!" << std::endl;
        else
            visited.insert(candidate);
    }
    return 0;
}

Of course, as iavr mentions, any of these approaches will require O(n) storage.

Edit: The basic idea here is very simple. The goal is to store all the visited locations in a way that allows you to quickly check if a particular location has been visited. Your solution had to scan through all the visited locations to do this check, which makes it O(n), where n is the number of visited locations. To do this faster, you need a way to rule out most of the visited locations so you don't have to compare against them at all.

You can understand my set-based solution by thinking of a binary search on a sorted array. First you come up with a way to compare (sort) the D-dimensional locations. That's what the Position class' < operator is doing. As iavr pointed out in the comments, this is basically just a lexicographic comparison. Then, when all the visited locations are sorted in this order, you can run a binary search to check if the candidate point has been visited: you recursively check if the candidate would be found in the upper or lower half of the list, eliminating half of the remaining list from comparison at each step. This halving of the search domain at each step gives you logarithmic complexity, O(log n).

The STL set container is just a nice data structure that keeps your elements in sorted order as you insert and remove them, ensuring insertion, removal, and queries are all fast. In case you're curious, the STL implementation I use uses a red-black tree to implement this data structure, but from your perspective this is irrelevant; all that matters is that, once you give it a way to compare elements (the < operator), inserting elements into the collection (set::insert) and asking if an element is in the collection (set::find) are O(log n). I check against the origin by just adding it to the visited set--no reason to treat it specially.

The unordered_set is a hash table, an asymptotically more efficient data structure (O(1)), but a harder one to use because you must write a good hash function. Also, for your application, going from O(n) to O(log n) should be plenty good enough.

Julian Panetta
  • 391
  • 1
  • 6
  • 1
    Nice. I guess your comparison operator could use `std::lexicographical_compare`. – iavr Nov 18 '13 at 15:17
  • Oh, yes, that would have worked. Although, it wouldn't have caught the case where different-dimensioned coordinates were compared. – Julian Panetta Nov 18 '13 at 15:18
  • @JulianPanetta I'm not a proficient enough programmer to fully understand this solution, I'm going to read some references and see if I can implement this myself. – NictraSavios Nov 18 '13 at 15:32
  • @NictraSavios: I just added a bit of explanation. Hopefully it's more clear now. – Julian Panetta Nov 18 '13 at 16:39
2

Your question concerns the algorithm rather the use of the (C++) language, so here is a generic answer.

What you need is a data structure to store a set (of point coordinates) with an efficient operation to query whether a new point is in the set or not.

Explicitly storing the set as a boolean array provides constant-time query (fastest), but at space that is exponential in the number of dimensions.

An exhaustive search (your second option) provides queries that are linear in the set size (walk length), at a space that is also linear in the set size and independent of dimensionality.

The other two common options are tree structures and hash tables, e.g. available as std::set (typically using a red-black tree) and std::unordered_set (the latter only in C++11). A tree structure typically has logarithmic-time query, while a hash table query can be constant-time in practice, almost bringing you back to the complexity of a boolean array. But in both cases the space needed is again linear in the set size and independent of dimensionality.

iavr
  • 7,547
  • 1
  • 18
  • 53
  • Note: the space/time complexity of the proposed algorithms aren't independent of dimensionality (that would be impossible). Rather, the space-efficient algorithms are linear in dimensionality. Of course the number of dimensions is probably small, so as long as the dependence isn't exponential it can probably be ignored. – Julian Panetta Nov 18 '13 at 16:15