It's probably an unnecessary optimization, but I think I'd try to take slightly better advantage of the specification. A hash table is intended primarily for cases where you have a fairly sparse conversion from possible keys to actual keys--that is, only a small percentage of possible keys are ever used. For example, if your keys are strings of length up to 20 characters, the theoretical maximum number of keys is 25620. With that many possible keys, it's clear no practical program is going to store any more than a minuscule percentage, so a hash table makes sense.
In this case, however, we're told that the input is: "an array a that contains only numbers in the range 1 to a.length". So, even if half the numbers are duplicates, we're using 50% of the possible keys.
Under the circumstances, instead of a hash table, even though it's often maligned, I'd use an std::vector<bool>
, and expect to get considerably better performance in the vast majority of cases.
int firstDuplicate(std::vector<int> const &input) {
std::vector<bool> seen(input.size()+1);
for (auto i : input) {
if (seen[i])
return i;
seen[i] = true;
}
return -1;
}
The advantage here is fairly simple: at least in a typical case, std::vector<bool>
uses a specialization to store bool
s in only one bit apiece. This way we're storing only one bit for each number of input, which increases storage density, so we can expect excellent use of the cache. In particular, as long as the number of bytes in the cache is at least a little more than 1/8th the number of elements in the input array, we can expect all of seen
to be in the cache most of the time.
Now make no mistake: if you look around, you'll find quite a few articles pointing out that vector<bool>
has problems--and for some cases, that's entirely true. There are places and times that vector<bool>
should be avoided. But none of its limitations applies to the way we're using it here--and it really does give an advantage in storage density that can be quite useful, especially for cases like this one.
We could also write some custom code to implement a bitmap that would give still faster code than vector<bool>
. But using vector<bool>
is easy, and writing our own replacement that's more efficient is quite a bit of extra work...