Without knowing language and platform it's somewhat hard to advise you on a detailed implementation, however my advice is somewhat similar to Lior Kogan's.
In my opinion, however, you only need two sets and no map is involved:
Say you have two variables representing sets, A and B.
Every heartbeat removes the agent id from set A.
Every 5 seconds, a different thread raises an alert for every agent id in B, then sets B = A, and last but not least creates a set with all of the agent ids and sets A to equal that (if the number of agent ids is really large, you can prepare the new set between one check and the other and only sleep for the remaining time).
Locking would only be needed while changing the variables pointing to each set, provided you use a lock-free set collection.
Performance will largely depend on the algorithmic complexity of said implementation, and if you go down this way, you should privilege the one with best performance (not necessarily best big-O, for instance if wost-case latency matters to you) for removals.
As a side note, if memory is not an issue or failures are relatively few, when you check whether you need to raise alerts and do so, you can do that on a thread of its own and getting possibly interesting performance speedups (again, the platform and runtime matter, for in erlang that would be a breeze but in Windows the cost of creating a full-blown new thread might exceed the performance benefit if the failures are few) at the cost of keeping the old B set in memory.