1

I'm currently working on the implementation of an algorithm that I would like to show that it can work in constant-time, even with a very large number of elements.

Unfortunately I need a data structure where to store the elements. When the number of elements is very high, but not unreasonable high for my algorithm, both std::vector and std::valarray do not access an arbitrary element in constant-time as you can see in this graph.

Is there a better data structure to store the values? Are there any techniques that I can implement to reach constant-time access?

  • 1
    Are you sure you're measuring *access* time? Because simple [`std::vector`](http://en.cppreference.com/w/cpp/container/vector) indexing is O(1). – Some programmer dude Mar 21 '17 at 13:49
  • 6
    What you see is probably a cache effect and no data structure can avoid it. With modern memory management, O(1) doesn't exist anymore. –  Mar 21 '17 at 13:49
  • 1
    It's hard to know for sure why your performance tanks at high vector sizes, we would need to see your benchmarking code, and know which compiler you are using with what flags. It's likely that the performance drop is related to memory access and not the container's access algorithm. `std::vector`'s element access has constant time complexity, regardless of size. – François Andrieux Mar 21 '17 at 13:49
  • If you rely on CPU time spent on element access, you should take into account the whole mechanism of accessing data - cache levels, access patterns and whether disk storage is involved. Also, storing an element differs from reading it. – Lyth Mar 21 '17 at 13:51
  • You only chance is to control the pattern of memory accesses and improve the locality of your algorithms. Depending on what you are doing, this will be possible... or not. –  Mar 21 '17 at 13:55
  • @Someprogrammerdude I'm storing std::clock() before and after accessing an arbitrary element in the vector, so that should be it? – Federico D'Ambrosio Mar 21 '17 at 13:58
  • @YvesDaoust: so what I'm getting is normal? – Federico D'Ambrosio Mar 21 '17 at 13:58
  • @Lyth: is there a practical way to compare between them? – Federico D'Ambrosio Mar 21 '17 at 13:58
  • @FedericoD'Ambrosio: yep, this is typical of modern computers, and probably a universal phenomenon. The larger the memory, the longer the access time. (From sub-nanosecond access in registers, to physically mounting a tape :-) As a first order approximation, the memory read/write times are a function of the distance from the previous access. –  Mar 21 '17 at 14:00
  • @FedericoD'Ambrosio benchmarking like you did is already a practical way to measure speed. It's just that there are much more factors involved, besides the choice of the container. So you could stay with the theory, have a nice O(1) in vectors/arrays and give a mathematical proof of your constant-time algorithm; but in reality you won't get constant time on *all* inputs due to environment (hardware and OS) – Lyth Mar 21 '17 at 14:20

1 Answers1

6

For high values of n it is very likely that:

You are hitting a caching problem. At some point, every memory access misses the cache, causing a longer memory load.

You are hitting caching problem with memory paging. Modern computer memory is organized in a tree-like structure. Every memory access goes through that tree, making every memory access O(log n) where n is the addressable memory space. You usually don't notice it, because of high arity of that tree and good caching. However, for very high n and random memory access this may become a problem.

A friend of mine was, for example, proving that a counting sort algorithm has O(n log n) time complexity because of random memory access. Quick-sort algorithm - for comparison - has very nice, sequential access to memory, and paging overhead is much much lower.

Bottom line is, you are most likely hitting architecture/OS memory access overhead - something that you won't be able to overcome unless you use some really extreme approach (such as implementing your own OS).

CygnusX1
  • 20,968
  • 5
  • 65
  • 109
  • 1
    An additional way round the problem is to use a cache-friendly algorithm. If you need to access each element exactly once, then you won't do much better than 0..n. OTOH, with something like transposing a matrix, it can be *much* faster to do it square blocks, rather than lines. – Martin Bonner supports Monica Mar 21 '17 at 14:00