I have a problem I'd like to code. I have a process which generates numbers 0 through n-1 and I want to stop it when it generates the first repeated element.* I'm looking for a data structure that makes this fast. In particular, adding a new element and testing if an element is in the structure need to be fast. The expected number of insertions is around sqrt(n) (birthday problem) or actually a bit worse (say sqrt(2n)) because the process slightly favors unique values. In other words, it is rather sparse -- working with the numbers up to a billion only about 30 or 50 thousand values will be used.
A hash set or some kind of self-balancing binary tree seems like the right approach, but maybe there's a better way? For small n I think a bit array would be superior but I'm looking at n around 10^9 which is too large for that to be practical I think.
* Actually, it doesn't need to stop right away -- if it's more efficient you can generate elements in blocks and check every now and then.
Note: This was originally posted on math.se but they recommended that I repost here. It's not research-level and so not suitable for cstheory.se.