1

What is time complexity of gethash function? For example, in c++ for map search takes O(log(n)), while for unordered_map it is O(1). Both things are written in descriptions, but I cannot find any such reference for gethash in Lisp.

Actually, this extends to all standard library functions. Where can I find their complexity, or can I? Talking about sbcl, if that matters.

sds
  • 58,617
  • 29
  • 161
  • 278
Andrew S.
  • 467
  • 3
  • 12
  • 1
    In languages where the programmer can use high-level, “abstract” data types like “container”, “map”, “set”, etc. such questions must be answered in the language specification. In Common Lisp there are only concrete data structures, with concrete representations: how lists, arrays and hash-tables are implemented is well known and their complexity is discussed in any good book on data structures. For hash table, for instance, you can see [this SO question](https://stackoverflow.com/questions/3949217/time-complexity-of-hash-table). – Renzo Oct 06 '18 at 19:58
  • Well, I was asking this especially because, as I mentioned in my question, hash-table can be ordered or unordered, and time complexity differs for these two. – Andrew S. Oct 06 '18 at 20:04
  • 2
    Hash table, by definition, are based on hash functions, so they are not ordered. – Renzo Oct 06 '18 at 20:52

3 Answers3

8

The reason the ANSI CL standard does not specify the algorithmic complexity of library functions is that it is not its job. The standard describes the behavior, and leaves performance to the implementation-specific docs. It was assumed that the best theoretical performance will be provided by all implementations (otherwise no one would use it).

To answer your specific question, gethash is O(1) in all implementations.

sds
  • 58,617
  • 29
  • 161
  • 278
3

An usual expectation would be that a Lisp implementation of GETHASH runs in O(1).

But it might have surprising hidden costs. The copying garbage collector (which some GCs are) might copy a hash-table in memory. This might then trigger a rehash of the table.

Rainer Joswig
  • 136,269
  • 10
  • 221
  • 346
0

The standard does not and should not tell you the complexity of functions like gethash. Imagine if it did do that: this would constrain implementations of the language to use implementations of the functions which agreed with the complexity in the standard. If someone came up with a much better hashing function, then implementations could not use it.

Well, you could argue, that's silly: the standard merely needs to specify upper bounds on complexity. This would allow an implementation to use any better function that it liked, right? But that's also not an answer: there could be (and in many cases are) algorithms which have terrible worst-case performance but much better expected performance. To deal with that in the standard would either be impossible (I think) or would cause it to be entirely covered by complicated descriptions of what complexity was acceptable and when and when not.

Providing upper complexity bounds would also rule out implementations which wanted to make tradeoffs between (say) how complicated and large the implementation is and how performant it is in some cases: an implementation should be allowed to have hashtables which are, inside, alists, for instance: these are typically very fast for small numbers of keys, but their performance falls apart for large numbers of keys. Such an implementation should be allowed.


There are some cases where the complexity of things is kind of obvious from the standard: it seems to be clear that the time complexity of length is linear in the length of the list (except that it may not terminate if the list is circular). But that's not true: there's nothing to prevent an implementation maintaining a length value somewhere which would make length be constant time in some cases. This would obviously be a heroic (to the point of being implementationally-implausible I think) and useless optimisation, but it is not the place of the standard to rule it out.

As an example of a case where a language (not an implementation of CL!) does something like this consider this description of Racket's list? predicate:

Returns #t if v is a list: either the empty list, or a pair whose second element is a list. This procedure effectively takes constant time due to internal caching (so that any necessary traversals of pairs can in principle count as an extra cost of allocating the pairs).

I don't completely understand this but I think that Racket must implementationally have a flag in its cons object which tells you that its cdr is a proper list, and then relies on immutability to know that is this is ever true it's always true. If CL had such a function then it would be very much harder to make it run in constant time.

  • 1
    > *If someone came up with a much better hashing function, then implementations could not use it.* That is simply incorrect if the specification is given in "big OH" which gives an upper bound. If the specification for an access says O(log n), and a constant-time implementation is provided, that is conforming; an O(1) function is also O(log n). But O(n log n) or O(n^2) would be nonconforming. – Kaz Oct 09 '18 at 23:55
  • @Kaz: I explained why specifying an upper bound would also be unduly constraining in the following paragraphs. –  Oct 10 '18 at 09:39
  • Indeed, I can't fathom why the C++ people thought it was a good idea to put stuff like this into the standard. Of course, "I can't fathom why they put that in" is a major theme in C++. – Kaz Oct 10 '18 at 19:36
  • @Kaz: Stuff like this is *in* the C++ standard? I feel a bit ill now. –  Oct 11 '18 at 17:52
  • 1
    It's alluded to right in the question! Yes; the C++ standard specifies asymptotic complexities for algorithmic operations. Evidently, it has [tightened up](https://en.wikipedia.org/wiki/Sort_(C%2B%2B)#Complexity_and_implementations) some of them over time. Now `std_sort` can't be implemented by something like quicksort that can degenerate to quadratic behavior on some inputs. – Kaz Oct 11 '18 at 19:06