1

pros, I need some performance-opinions with the following:

1st Question:

I want to store objects in a 3D-Grid-Structure, overall it will be ~33% filled, i.e. 2 out of 3 gridpoints will be empty. Short image to illustrate:

enter image description here

Maybe Option A)

vector<vector<vector<deque<Obj>> grid;// (SizeX, SizeY, SizeZ);
grid[x][y][z].push_back(someObj);

This way I'd have a lot of empty deques, but accessing one of them would be fast, wouldn't it?

The Other Option B) would be

std::unordered_map<Pos3D, deque<Obj>, Pos3DHash, Pos3DEqual> Pos3DMap; 

where I add&delete deques when data is added/deleted. Probably less memory used, but maybe less fast? What do you think?

2nd Question (follow up)

What if I had multiple containers at each position? Say 3 buckets for 3 different entities, say object types ObjA, ObjB, ObjC per grid point, then my data essentially becomes 4D?

Another illustration: enter image description here

Using Option 1B I could just extend Pos3D to include the bucket number to account for even more sparse data. Possible queries I want to optimize for:

  1. Give me all Objects out of ObjA-buckets from the entire structure
  2. Give me all Objects out of ObjB-buckets for a set of grid-positions
  3. Which is the nearest non-empty ObjC-bucket to position x,y,z?

PS:

I had also thought about a tree based data-structure before, reading about nearest neighbour approaches. Since my data is so regular I had thought I'd save all the tree-building dividing of the cells into smaller pieces and just make a static 3D-grid of the final leafs. Thats how I came to ask about the best way to store this grid here. Question associated with this, if I have a map<int, Obj> is there a fast way to ask for "all objects with keys between 780 and 790"? Or is the fastest way the building of the above mentioned tree?

EDIT

I ended up going with a 3D boost::multi_array that has fortran-ordering. It's a little bit like the chunks games like minecraft use. Which is a little like using a kd-tree with fixed leaf-size and fixed amount of leaves? Works pretty fast now so I'm happy with this approach.

Bersaelor
  • 2,517
  • 34
  • 58
  • The solution with vector will take up place in memory even for the empty "boxes", while in the second with the map you only have the "boxes" that are actually there. It's the usual speed-versus-memory tradeoff, and you have to decide which is more important: Speed or memory. – Some programmer dude Sep 19 '14 at 11:15
  • I don't like nested vectors, you could use a multi-dimensional array. e.g. boost::multi_array. – CashCow Sep 19 '14 at 11:22
  • hash tables have loads of unused memory space and I doubt it would use any more memory anyway. – CashCow Sep 19 '14 at 11:22
  • mhmm, I think my most used query will be "grid[x][y][z].empty?", is there some trick to access ranges of keys in unordered_maps? Because otherwise I have to check many grid-points sequentially whether they are empty.. – Bersaelor Sep 19 '14 at 11:28
  • @CashCow: I haven't decided whether I want to go through the trouble of importing boost. It's always so much extra work cross-compiling it and making it work with new iOS/Android versions and cpu-architectures.. – Bersaelor Sep 19 '14 at 11:30
  • Well it's easy enough to implement a 3-dimensional array yourself. – CashCow Sep 19 '14 at 11:37
  • The implementation is not the question, I just wondered whether I'd be using a lot of memory and performance up by saving and checking all these empty containers in this 4D dataset. – Bersaelor Sep 19 '14 at 11:44
  • You should consider using a k-d tree. – D Drmmr Sep 19 '14 at 18:04
  • Ended up using boost, since a boost::multiarray 3D really is faster then a vector^3. Even further I'm allowed to specify the storage order to fortran_storage_order so I don't have to do any awkward zyx ordering with the c-indizes. – Bersaelor Sep 21 '14 at 10:38

1 Answers1

1

Answer to 1st question

As @Joachim pointed out, this depends on whether you prefer fast access or small data. Roughly, this corresponds to your options A and B.

A) If you want fast access, go with a multidimensional std::vector or an array if you will. std::vector brings easier maintenance at a minimal overhead, so I'd prefer that. In terms of space it consumes O(N^3) space, where N is the number of grid points along one dimension. In order to get the best performance when iterating over the data, remember to resolve the indices in the reverse order as you defined it: innermost first, outermost last.

B) If you instead wish to keep things as small as possible, use a hash map, and use one which is optimized for space. That would result in space O(N), with N being the number of elements. Here is a benchmark comparing several hash maps. I made good experiences with google::sparse_hash_map, which has the smallest constant overhead I have seen so far. Plus, it is easy to add it to your build system.

If you need a mixture of speed and small data or don't know the size of each dimension in advance, use a hash map as well.

Answer to 2nd question

I'd say you data is 4D if you have a variable number of elements a long the 4th dimension, or a fixed large number of elements. With option 1B) you'd indeed add the bucket index, for 1A) you'd add another vector.

  1. Which is the nearest non-empty ObjC-bucket to position x,y,z?

This operation is commonly called nearest neighbor search. You want a KDTree for that. There is libkdtree++, if you prefer small libraries. Otherwise, FLANN might be an option. It is a part of the Point Cloud Library which accomplishes a lot of tasks on multidimensional data and could be worth a look as well.

jotrocken
  • 2,263
  • 3
  • 27
  • 38
  • Thanks that confirms what I already mentioned in the question 1A) should be fast 1B) uses less memory. But it's nice of you to explain it again for people who find this question. @2nd: I have worked with k-neigherest neighbours before and was thinking about R-tree's, thanks for the link to FLANN. I'll try to compare some performance and report back when I have result. – Bersaelor Sep 19 '14 at 14:52