I use the Windows EvtSubscribe API in my program that runs as a service, generally on Windows Server 2008 R2 Domain Controllers. It is registered for kerberos logon events and it's purpose is to provide single sign on for my application on the network.
I grab the username/IP from the logon event and use them to pre-authenticate an IP address. This has worked well in a large number of sites until it was used recently on an extremely large site (60,000 users logging on and off throughout the day). The Domain Controller isn't under extremely high load as far as I can tell from Process Monitor but the events are not being passed on to my application right away, they delay by what can be 20 minutes to an hour.
I use the PUSH method as described in the API. The code is identical.
In Event Viewer, looking at the security logs the logon events come in immediately when a user logs on to the domain. However the event is not pushed to my application till much, much later.
I have never seen this occur at any of the other sites my application has been installed on and I'm wondering if its a configuration issue on the servers themselves. The site with the delays has 4 clustered domain controllers in total with my application running and reporting on each. All 4 periodically experience extended delays in receiving the events.
Has anyone else come across something similar or have any ideas what could be at play?
I have tried replicating it using VMs and ADTest to generate load without much luck.