0

Do you remember my prior question: What is causing data race in std::async here?
Even though I successfully parallelized this program, it still ran too slowly to be practical.
So I tried to improve the data structure representing a Conway's Game of Life pattern.
Brief explanation of the new structure:

class pattern {
    // NDos::Lifecell represents a cell by x and y coordinates.
    // NDos::Lifecell is equality comparable, and has std::hash specialization.
private:
    std::unordered_map<NDos::Lifecell, std::pair<int, bool>> cells_coor;
    std::unordered_set<decltype(cells_coor)::const_iterator> cells_neigh[9];
    std::unordered_set<decltype(cells_coor)::const_iterator> cells_onoff[2];
public:
    void insert(int x, int y) {
        // if coordinate (x,y) isn't already ON,
        // turns it ON and increases the neighbor's neighbor count by 1.
    }
    void erase(int x, int y) {
        // if coordinate (x,y) isn't already OFF,
        // turns it OFF and decreases the neighbor's neighbor count by 1.
    }
    pattern generate(NDos::Liferule rule) {
        // this advances the generation by 1, according to the rule.
        // (For example here, B3/S23)
        pattern result;
        // inserts every ON cell with 3 neighbors to result.
        // inserts every OFF cell with 2 or 3 neighbors to result.
        return result;
    }
    // etc...
};

In brief, pattern contains the cells. It contains every ON cells, and every OFF cells that has 1 or more ON neighbor cells. It can also contain spare OFF cells.
cells_coor directly stores the cells, by using their coordinates as keys, and maps them to their number of ON neighbor cells (stored as int) and whether they are ON (stored as bool).

cells_neigh and cells_onoff indirectly stores the cells, by the iterators to them as keys.
The number of ON neighbor of a cell is always 0 or greater and 8 or less, so cells_neigh is a size 9 array.
cells_neigh[0] stores the cells with 0 ON neighbor cells, cells_neigh[1] stores the cells with 1 ON neighbor cell, and so on.
Likewise, a cell is always either OFF or ON, so cells_onoff is a size 2 array.
cells_onoff[false] stores the OFF cells, and cells_onoff[true] stores the ON cells.
Cells must be inserted to or erased from all of cells_coor, cells_neigh and cells_onoff. In other words, if a cell is inserted to or erased from one of them, it must be so also for the others. Because of this, the elements of cells_neigh and cells_onoff is std::unordered_set storing the iterators to the actual cells, enabling fast access to the cells by a neighbor count or OFF/ON state.

If this structure works, the insertion function will have average time complexity of O(1), the erasure also O(1), and the generation O(cells_coor.size()), which are great improval of time complexity from the prior structure.

But as you see, there is a problem: How can I hash a std::unordered_map::const_iterator?
std::hash prohibits a specialization for them, so I have to make a custom one.
Taking their address won't work, as they are usually acquired as rvalues or temporaries.
Dereferencing them also won't work, as there are multiple cells that have 0 ON neighbor cells, or multiple cells that is OFF, etc.
So what can I do? If I can't do anything, cells_neigh and cells_onoff will be std::vector or something, sharply degrading the time complexity.

Community
  • 1
  • 1
Dannyu NDos
  • 2,458
  • 13
  • 32
  • Do you think maybe there's a reason `std::hash` prohibits a specialization for them? I guess I can see why you want this, but IMO it's another piece of evidence that storing iterators is not always the best idea. I don't have an alternative for you, though. BTW your map is missing a value type; did you mean `unordered_set`? – Lightness Races in Orbit Mar 04 '17 at 13:02
  • @LightnessRacesinOrbit I have no idea why `std::hash` does, but yes, I want to store iterators because there must be communication in them. If a cell is inserted to one, it must also be inserted to the others, etc. – Dannyu NDos Mar 04 '17 at 13:06
  • @Tony D Sorry about that, they must be `std::unordered_set`. Fixed. – Dannyu NDos Mar 04 '17 at 13:07
  • @DannyuNDos: What sort of volume are we talking about here? You could use a `std::set` instead if it's only a few elements - you [probably] won't notice a difference. – Lightness Races in Orbit Mar 04 '17 at 13:08
  • @LightnessRacesinOrbit Not `std::unordered_set`? But that will cost times. Yes, there will be few elements, but I'm searching for billions of Conway's Game of Life patterns. Few improvement of time complexity is just great. – Dannyu NDos Mar 04 '17 at 13:09
  • @TonyD, LightnessRacesinOrbit I edited the question for more specific explanation. – Dannyu NDos Mar 04 '17 at 13:32
  • 3
    Instead of storing an iterator as key, you could probably store `std::unordered_map::value_type*`, as obtained with `&*iter`. Or at least, you could implement a hash of the iterator by hashing `&*iter` pointer. – Igor Tandetnik Mar 04 '17 at 13:40
  • @IgorTandetnik That should work. Thank you very much! Why don't you post it as an actual answer? – Dannyu NDos Mar 04 '17 at 13:43
  • 2
    @DannyuNDos: _"Few improvement of time complexity is just great"_ No, only if you know you need it. If you only have a few elements, there is really no reason to go to a hashmap, especially when you.. y'know.. can't. A lookup within a tree of four or five elements may even be _faster_ than a hash calculation! (Probably not but you get the point) Understanding algorithmic complexity is important, but you must also consider the constant factor in your program. Measure actual performance. – Lightness Races in Orbit Mar 04 '17 at 14:53

1 Answers1

2

Short story: this won't work (really well)(*1). Most of the operations that you're likely going to perform on the map cells_coor will invalidate any iterators (but not pointers, as I learned) to its elements.

If you want to keep what I'd call different "views" on some collection, then the underlying container storing the actual data needs to be either not modified or must not invalidate its iterators (a linked list for example).

Perhaps I'm missing something, but why not keep 9 sets of cells for the neighbor counts and 2 sets of cells for on/off? (*2) Put differently: for what do you really need that map? (*3)


(*1): The map only invalidates pointers and iterators when rehashing occurs. You can check for that:

// Before inserting
(map.max_load_factor() * map.bucket_count()) > (map.size() + 1)

(*2): 9 sets can be reduced to 8: if a cell (x, y) is in none of the 8 sets, then it would be in the 9th set. Thus storing that information is unnecessary. Same for on/off: it's enough to store cells that are on. All other are off.

(*3): Accessing the number of neighbours without using the map but only with sets of cells, kind of pseudo code:

unsigned number_of_neighbours(Cell const & cell) {
  for (unsigned neighbours = 9; neighbours > 0; --neighbours) {
    if (set_of_cells_with_neighbours(neighbours).count() == 1) {
      return neighbours;
    }
  }
  return 0;
}

The repeated lookups in the sets could of course destroy actual performance, you'd need to profile that. (Asymptotic runtime is unaffected)

Community
  • 1
  • 1
Daniel Jour
  • 15,896
  • 2
  • 36
  • 63
  • Oh my. I missed that... Okay, for explanation, I need to access the cells by coordinate, neighbor count, or ON/OFF state quickly. `cells_coor`, `cells_neigh` and `cells_onoff` correspond to them, respectively. – Dannyu NDos Mar 04 '17 at 13:52
  • True, but what information do you get from the map that you couldn't also get from the sets? The count of neighbors? Look into each of the 9 sets. On/off? If the cell is not in the on set, then its off, isn't it? Btw, you can only keep a single on set and 8 sets for the neighbours. – Daniel Jour Mar 04 '17 at 13:55
  • The neighbor count and ON/OFF state **belong** to the coordinate... If I store coordinates, neighbor counts and ON/OFF states individually, I won't be able to know what belongs to what. – Dannyu NDos Mar 04 '17 at 13:58
  • Maybe I should try storing `std::reference_wrapper? – Dannyu NDos Mar 04 '17 at 14:24
  • @T.C. Indeed, I didn't knew that. Corrected that part, though I'll have to think about whether the original approach could then be better than my suggestion. – Daniel Jour Mar 05 '17 at 21:46