12

I have an ASP.NET (v4.0) web app that is installed in a virtual directory (as an application) and is hosted in it's own app pool. This is repeated for each instance of the app (i.e. per customer).

The app pools are integrated (not classic) mode and LoadUserProfile is set to true. Otherwise, default settings.

Each instance currently has it's own copy of the code/config, and it's own data folder (basic file read/writes).

1 instance of this app runs well (operation used for comparison takes ~4 seconds). Every other instance runs slowly (from 10-25 seconds for the same operation).

If I move the slower instance to the "fastest" app pool that instance springs to life. If I move the faster instance into the slower app pool that instance slows to a crawl.

The app pools were created in the same way initially - manually. I later used the powershell copy routine to ensure an exact copy of the faster app pool and still the same behaviour. Comparing the apppool.config files shows they are identical barring the virtual directory assignments.

There are no shared resources that are being blocked, so far as I can tell, and I tested that by shutting down the performant app pool and restarting... slow is still slow, and then when I restart that app pool (so it's loaded last) it's still faster...

  • And the app folders are byte-identical? – usr Aug 11 '12 at 19:06
  • Yes, just triple checked but the only difference is in the web.config where it specifies the name of the virtual directory it is being hosted under (and I double checked that was the only difference too) Every other file is byte-identical... –  Aug 12 '12 at 22:34
  • What is the app doing that is taking longer in one apppool? Can you attach VS and pause the debugger to profile it? – usr Aug 12 '12 at 22:42
  • I don't have VS installed on the server since it's a production system, but that looks like it's my next stop. I'm part way through adding massively verbose logging to the data access and file access components as the tracing we have has shown nothing specific as yet. I'll get some more stats and add it ASAP - I was optimistic that someone may have encountered similar so I could to avoid that path –  Aug 13 '12 at 08:32
  • You can use xperf to capture sampled stack traces from all managed processes. It is a single no-install executable released by Microsoft. The tool is quite raw but you can nicely look into running processes. – usr Aug 13 '12 at 09:26
  • I have never experienced this and you have probably tried these but just in case this is what I'd do to start.. Have you tried creating a third instance and see if that's slow? Set all Permissions to "Everyone" just to check its not the permissions (or change the app pool user to networks service and then set that as the overriding user on your folders)? If you stop the fast instance, and does the slow one still run slow? Have you restarted IIS? Can you put the simplest site under both app pools and see if they still have the same effect? Can you re-upload the site just to make sure? –  Aug 13 '12 at 10:52
  • @usr - I'll look at getting xperf (looks like it's only available in the windows sdk now?) and give that a go - thanks for the tip –  Aug 13 '12 at 13:20
  • @Bex - thanks for thinking through it, but I have indeed tried everything you suggest. Redeployment of code, "Everyone" permissions on folders, stopped the fast instance while the others running (and also reset all the others to ensure it's not a load order issue). I've actually got 4 instances of this app running in 4 virtual directories and it's actually the same behaviour for all of the "other" instances with only the single (and apparently magic) pool that's working as expected! –  Aug 13 '12 at 13:27
  • Oh I meant perfview http://www.microsoft.com/en-us/download/details.aspx?id=28567. Perfview is for managed, xperf more for native. – usr Aug 13 '12 at 13:27
  • @Ben I assume you have tried setting this up on your local (win7) machine with 2 VDs? See if you get the same result? If so, VS on the server I think is your only way unless usrs suggestion helps. It will be something really silly you have overlooked, it always is, but unfortunately I can't think of anything else for you right now. –  Aug 13 '12 at 13:44
  • 1
    Since you are loading the user profile, that seems to be the key difference between those app pools. Check those users temp locations, file write permissions there, and if connecting to a database using the same user, then check permissions on the DB as well. All it takes is for a query to timeout for that user, so test with the same DB if possible. Good Luck! –  Aug 13 '12 at 19:36
  • @snives good point that the user profiles themselves would actually be a difference. My thinking/understanding was (and still is) that the app pool identities are effectively identical to each other (?) so I'd not thought about that. I have, however, managed to get the other sites working if I create and run them as Admin-level users... this doesn't seem to explain why one app pool identity is working well and the others aren't - but at least gets things moving while I find out what the real issue is! –  Aug 14 '12 at 10:38
  • Have you checked the performance counters for each app pool to see its activity? Especially check the thread count, maybe one app is eating up all threads via the thread pool and starving the other app pool. Or other resources are being consumed, starving the other app pool. – Bart Verkoeijen Aug 29 '12 at 02:19
  • What are CPU/disk/network/other utilizations? Is it "active" workload or just waiting? – Peter Ivan Feb 25 '13 at 19:42

4 Answers4

1

To further isolate the problem, I would suggest running Wireshark (or other packet analyzer of choice) on the host system, for two sessions. Assumption I'm making is that each app pool has either a unique IP assigned to it, or a unique port.

First get your baseline performance by filtering on the IP:port of your fast app pool. See what the traffic to and from the app looks like under "normal" conditions.

The second run, you will need to capture traffic from the slow/unresponsive app pool. If all network routing and such is correct up to the box, you should see repeated requests in one direction, most likely to the app from elsewhere, BUT if your app is something that makes a lot of requests to another server, your traffic may be heavy on egress instead of ingress.

This test will tell you if the problem is within the app, or if it's TCP/IP related issues that result in requests to the app timing out due to low/no communication.

Correlate the timestamps of your tests with the server's event logs and (if applicable) tracesink logs, and you should be able to zero in on the problem.

George Erhard
  • 814
  • 6
  • 12
0

Failed Request Tracing (FRT) will be your best tool to track this down. It will show the pipeline and how long each step took to complete. That should point out whether it's something within the asp.net portion or if it's something within the IIS pipeline itself.

To setup FRT, from IIS at the site level create an FRT rule with an http status range of 200-999, and make sure to enable FRT (it's a separate step from the actions pane).

Then reproduce the issue and look at the generated files (%SystemDrive%\inetpub\logs\FailedReqLogFiles\w3svc{siteid}). Open them in Internet Explorer.

Scott Forsyth
  • 16,449
  • 3
  • 37
  • 56
0

When IIS delays for long periods of time for no apparent reason it usually means that it is waiting for an outside service to time out. When it does then it tries another approach. For that matter this is true of just about anything in Windows or Linux. The first suspect for me in these situations is always network name resolution configuration. It's guilty until proven innocent.

Can this be recreated with one user hitting one of the duplicate sites? It would be good to know what the CPU and disks are doing when you recreate the 10 to 20 second response time. If it appears that not much is going on during the 10-20 seconds then you should check your binding names and name resolution for the bound names. If the CPU or disks are chewing away then you will need to figure out which process is working too hard and why.

Please post your findings, I am curious.

PS I would also check name resolution and access to any authentication services that may be in play. For example if the domain sever needs to be consulted make sure the first DNS server in your list is DNS server for the domain.

BigTFromAZ
  • 31
  • 4
0

This is not the solution but we will work towards it;

  1. Run IISRESET (during maintenance hours) or kill the W3WP that belongs to the unresponsive app pool

  2. Launch the application

  3. While it is unresponsive, grab a dump file of W3WP that belongs to the slow app pool. Use either process explorer or task manager to create a dump file

  4. Grab the mscordacwks.dll and mscorwks.dll from under C:\Windows\Microsoft.NET\Framework64\v4.0.XXXXX

  5. Zip and upload these file somewhere from where I can download.

G33kKahuna
  • 289
  • 1
  • 4
  • 10