31

My team works with the COM API of a large simulation application. Most simulation files run into the hundreds of megabytes and appear to get fully loaded into memory when they are opened.

The main task that we perform is iterating through all of the elements in the object model of the file and then doing 'something' to each element.

We have recently moved our code base from .NET 2 in to .NET 4 in VS 2010 and have seen the iteration speed drop by about 40 times (from ~10 seconds to about 8 minutes). We have reduced this to the smallest possible example of code (10 lines or so); compiled this in VS 2005, run it and then opened the project in VS 2010 and compiled, leaving the framework as 2 (we are using the manufacturer supplied COM interop assemblies).

In 2005 the test app completes in 10 seconds in 2010 it takes 8 minutes.

What could be causing this?

UPDATE

The code is equivalent to:

var server = new Server();
var elements = server.Elements;
var elementCount = elements.Count;

for(int i = 0; i < elementsCount; ++i)
{
    var element = elements[i];
}

This code takes 40 times longer to run through VS 2010 than VS 2005.

UPDATE 2

I rationalised that the only reason that the operation can be dramatically slower in one case than the other is that data is transferred differently over COM in the different versions.

We recorded the binding logs for both cases and this is what we found; in the fast version the native image of CustomMarshalers is not found (these are the binding logs captured by FUSLOGVW)

mscorlib

mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089.HTM

Fast

LOG: Start binding of native image mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089.
LOG: Start validating native image mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089.
WRN: Native image does not satisfy request. Looking for next native image.
WRN: No matching native image found.

Slow

LOG: Start binding of native image mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089.
LOG: Start validating native image mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089.
LOG: Bind to native image succeeded.

CustomMarshalers

CustomMarshalers, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a

Fast

LOG: Start binding of native image CustomMarshalers, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a.
LOG: Start validating native image CustomMarshalers, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a.
WRN: Native image does not satisfy request. Looking for next native image.
WRN: No matching native image found.

Slow

LOG: Start binding of native image CustomMarshalers, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a.
LOG: Start validating native image CustomMarshalers, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a.
LOG: Start validating all the dependencies.
LOG: [Level 1]Start validating native image dependency mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089.
LOG: Dependency evaluation succeeded.
LOG: [Level 1]Start validating IL dependency Microsoft.VisualC, Version=8.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a.
LOG: Dependency evaluation succeeded.
LOG: Validation of dependencies succeeded.
LOG: Start loading all the dependencies into load context.
LOG: Loading of dependencies succeeded.
LOG: Bind to native image succeeded.
Native image has correct version information.
Attempting to use native image C:\WINDOWS\assembly\NativeImages_v2.0.50727_32\CustomMarshalers\3e6deccf191ab943d3a0812a38ab5c97\CustomMarshalers.ni.dll.
Native image successfully used.

So it looks like we get a big performance boost when the native image is not used.

Why would this bind fail in one case and succeed in an other, and how do we force the application to not use the native image?

UPDATE 3

The oddness continues. If I run this code in VS 2010 in a test method using the R# test runner, or the in-built Visual Studio test runner then it runs at the fast speed.

I have tried wrapping this code in an assembly and then loading that dynamically and that makes no difference.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
satnhak
  • 9,407
  • 5
  • 63
  • 81
  • I'm a little confused what moved from VS2005 to VS2010. Was it the COM server (the native C++ code) or the COM client (the C# code) or both? Have you tried to isolate which piece slows down? Have you used the code compiled in VS2005, but with the version of .NET installed by VS2010 (yes, the behavior of .NET 2.x gets updated when the later versions are installed). – Ben Voigt Oct 08 '12 at 18:19
  • Here native image does not, as far as I'm aware, mean native as in unmanaged, it means that it is a managed assembly that has been pre-compiled and optimised for the current architecture (in this case x86) rather than being JIT compiled: http://msdn.microsoft.com/en-us/library/vstudio/6t9t5wcf(v=vs.90).aspx. The really odd bit is obviously that the reverse is happening: the native library has dramatically worse performance. – satnhak Oct 08 '12 at 20:43
  • Not had much XP with ngen but is it worth trying ngenning the offending assembly again? Maybe there's some sort of issue with the native image. Failing that is it worth just removing the native version? – Charleh Oct 09 '12 at 16:40
  • Could it have been generated with Profile and Debug flags therefore causing additional debug gunk to be in the native assembly? – Charleh Oct 09 '12 at 16:45
  • @TheMouthofaCow: I'm not asking about your "native image" ngen-ed assembly. I'm talking about the actual COM server. Has that been recompiled using VS2010, or is it the same binary? – Ben Voigt Oct 09 '12 at 18:42
  • @BenVoigt - the COM server is a 3rd party API. I happen to know that it is written in C++ in VS 2008, but I think it is, on this occasion not anything to do with their code. – satnhak Oct 09 '12 at 19:01
  • @Charleh the native assembly is part of the mscorlib core framework library. However I will check the debugging flags when I get in tomorrow. – satnhak Oct 09 '12 at 19:03
  • @TheMouthofaCow: But the COM server has not been recompiled with a new version of VS? That's the important information I was trying to get, because the way you worded the title of your question makes it sound like the server, not the client, was moved to VS2010. In fact it now sounds as if the COM interface hasn't changed in the slightest, it's the *calls* to the API that are slower than before. – Ben Voigt Oct 09 '12 at 19:09
  • @BenVoigt - Yes, the COM API is provided by a 3rd party it has not changed. There is just one interface. Our application is written in C#, when we upgraded to VS 2005 the performance degraded. It is the calls to the COM API that are slower. – satnhak Oct 09 '12 at 19:34
  • Is the performance decrease only when debugging under Visual Studio, or is it evident when running the compiled application without a debugger attached? – Andy Hopper Oct 09 '12 at 19:44
  • There have been reports of strange behavior for executables running under VS2010 that was ultimately tracked down to the Machine Debug Assistant. You might review the question and the accepted answer at http://stackoverflow.com/questions/4348418/did-p-invoke-environment-change-in-net-4-0 – David W Oct 09 '12 at 19:54
  • It makes no difference whether a debugger is attached or not; it is exceptionally slow. However, when running through the MS Test runner it runs very quickly. – satnhak Oct 09 '12 at 20:16
  • @David W - thanks, unfortunately that doesn't help. – satnhak Oct 09 '12 at 20:26
  • @TheMouthofaCow *sigh* Sorry.....Are the MS Test runner instances run on the same physical machines from the same executable source location, or from a copy in a different folder? – David W Oct 09 '12 at 20:31
  • What threading model is your thread that creates the COM server and what apartment model is the server? Does it make any difference if you set ? – cirrus Oct 09 '12 at 20:43
  • Are you absolutely certain your threading models match? [StaThread] at the top of the calling method may make a difference? Also, I've seen cases where custom container classes are used to implement collections and there's a huge difference between using an iterator and indexing into the collection. –  Oct 09 '12 at 20:44
  • @David W - Yes, this is all happening on the same physical machine. – satnhak Oct 10 '12 at 08:28
  • @ebyrob - bingo it was the threading! Grrrr. Stick that in an answer and the bounty is yours my friend. – satnhak Oct 10 '12 at 08:38
  • @ebyrob - It turns out that we were using STAThread in our main application, the test app was using MTAThread - switching over to STAThread fixed the issue, however since our actual APP is already STA then this did not fix the issue. I tried running the same code on a background thread and it ran slowly. So it seems that calling the COM API from a background thread is the issue. PS bounty is still yours. – satnhak Oct 10 '12 at 09:51
  • I'd recommend turning the [assembly load trace](http://technet.microsoft.com/en-us/library/hh875651%28v=ws.10%29.aspx) on to see what bits of .NET VS2010 thinks its trying to load. I (am guessing) the VS2005 version is loading a different version of some interop assembly that is not found/valid for VS2010. – gbjbaanb Oct 09 '12 at 20:33
  • Thanks for everyone's help with this. The cause it seems is that even in an STA application threads are created with an MTA threading model and this means that the COM object is actually created on the main thread and every call is marshalled between threads (that is my understanding anyway). To resolve this it is necessary to create a `var t = new Thread()` and set `t.SetApartmentState(ApartmentState.STA)`; by default all threads are MTA - and you cannot use the thread pool (i.e. bgworker) as all thread pool threads are MTA only. – satnhak Oct 10 '12 at 12:56
  • If you want to call your object on a worker thread without going through a proxy then you need to initialize that thread as STA AND create the object from that thread in the first place. – cirrus Oct 10 '12 at 20:44
  • I have the same issue on using a STAThread as my main app, and when calling a Interop of a 3rd party dll the app runs super slow. If i use a console App without STAThread and call the 3rd party dll it runs fast. my solution for now is to use a console app and use process.Start to run the slow task. and then use a fileWatcher to talk between apps. :( – 1st4ck Feb 05 '15 at 14:27
  • I'm guessing that your main app is a UI app, i.e. it has an STA main thread. What happens if you create a new MTA thread and execute the code on that thread? https://msdn.microsoft.com/en-us/library/system.threading.thread.setapartmentstate%28v=vs.110%29.aspx – satnhak Feb 05 '15 at 15:02

2 Answers2

6

It was kind of a long shot. Glad I could help.

Matching MTA vs STA (threading model) is really important when making lots of distinct calls into any COM object. An [STAThread] directive at the top of a method is one way to be sure of threading model for every call in that method.

Looks like Thread.SetApartmentState(ApartmentState.STA) will work for a whole thread, but not apparently for thread pool threads.

  • Cheers, really appreciate it. Apparently I can't award the bounty for another two hours, but once I can it will be yours :) – satnhak Oct 10 '12 at 14:09
  • P.S. - If your background thread calls a method that's defined [STAThread] (and the loop is inside the STA method) then you should still see performance gain. Entering the apartment only once, instead of n times as it were. –  Oct 10 '12 at 16:50
  • I don't think that's quite correct either. If you background thread isn't the thread that created the object in the first place, it will go through a proxy whether it is STA itself or not. I don't think your answer is accurate. – cirrus Oct 10 '12 at 20:46
  • Note: after further testing: [STAThread] only works for certain functions like main(). It won't work in a manually created thread for instance, only Thread.SetApartmentState() works in that case. –  Oct 16 '12 at 15:47
  • 1
    But for clarity, SetApartmentState() isn't enough. The object must also be "owned" by that thread - It must create it first. Simply having multiple threads all marked STA and trying to call STA objects created on other threads (like in a thread-pool model) will be sloooow. – cirrus Oct 16 '12 at 17:34
2

When you say, "...even in an STA application threads are...", that isn't actually correct. A thread can choose to set up it's apartment state before it accesses any COM objects, but in .NET if you do nothing those threads will implicitly be MTA.

The threadpool is MTA. It will need to be if you think about it, because if it were full of STA threads it would be a crappy thread-pool as any time a thread tried to access an object created on one of the other threads in the pool it would require marshalling.

Thread.SetApartmentState will only work per thread by definition. It could never affect any other threads (as you've discovered). Objects belong to an apartment and a thread may belong to a single threading model. If the thread tries to visit an object with a mismatched model it will need to be marshalled.

If your COM Server is marked as "both" then you can use it without a proxy from either an STA or an MTA thread. If that's the case, you're lucky, and you should create it on an MTA thread to begin with (or have the threadpool threads do so).

If you create it on an STA thread, even if (especially if) all your other threads are STAs, they will ALL go through a proxy, unless you happen to call the object from the thread that originally created it.

If your COM server is single threaded then you'll need to make sure you call it not only from an STA thread, but the STA thread that first creates it, otherwise you'll be marshalled through a proxy.

cirrus
  • 5,624
  • 8
  • 44
  • 62
  • I believe if it's all in a single process (COM server and COM client) then there is really only one thread for all STA's. If you call into the STA object many times all at once with any STA apartment, you should keep your "lock" on that single thread until someone else interrupts you, or until you're finished and so get some performance gain. For whatever reason, MTA threads seem to release this lock on every method return. (and performance suffers greatly in certain scenarios as a result) –  Oct 10 '12 at 16:40
  • I don't quite follow you. There's a 1:1 relationship between an STA and a thread. A thread creates an STA. Objects created by that thread will only exist in that apartment and are guaranteed to only every be called by that thread. If a thread calls CoInitializeEx() it creates an STA distinct from the first STA, right? The thread must run a message pump to service messaged requests from other threads by proxy. Typically, your main thread will have a message pump, other may not. Basically, if you call a COM object on *any* other thread, whether MTA or STA, you'll go through a proxy to the first. – cirrus Oct 10 '12 at 20:28
  • All I'm trying to say is, I'd expect the other threads to also benefit from performance if they use the STA thread model particularly if it's the same thread that created the object even if it's not the first thread that created an object of that type. I'm pretty sure I've seen this in the past, but I'm going to have to write a test to see for sure. –  Oct 10 '12 at 20:58
  • I can't see how a STA based COM object can be called directly (legally) by more than one thread, moreover the creator thread. It's possible that there's an optimisation between MTA->STA and STA->STA but either way, it's going through a proxy. – cirrus Oct 10 '12 at 21:09
  • This seems somewhat true for Single threaded COM objects. However, I'm seeing Apartment threaded COM objects get performance benefits in sub-threads when Thread.SetApartmentState() is used (but not with [STAThread] directive which seems to have no effect) –  Oct 11 '12 at 14:58
  • So if you create an inproc COM Server on STA thread1, pass to, and use it from STA thread2 are you seeing the same performance? – cirrus Oct 11 '12 at 15:14
  • Do you mean use CoGetClassObject() instead of CoCreateInstance()? –  Oct 11 '12 at 15:29
  • No. I'm not sure why you said that. The test you need to do is to spin up two threads, set both to ApartmentThread.STA, get them both to create a separate instance of the COM class (assuming it's not a singleton), and measure the timings. They should be the same. Then, share the COM reference to the other thread and see how long that takes. I would expect that to be slower, even though both are STA. – cirrus Oct 11 '12 at 17:19
  • That's what I'd done, it's very slow to access an object created in another thread. (60,000 ticks) fast if it was created in your own thread (50 ticks) slow if you're an MTA thread accessing STA object (2,600 ticks). –  Oct 12 '12 at 15:36
  • Right, so STA->STA is slow. So what I'm struggling to understand is how this correlates with your answer? – cirrus Oct 13 '12 at 08:20
  • STA->STA is fastest (50 ticks). Unless you create an object in one thread and call it in another. (not recommended: 60,000 ticks). –  Oct 16 '12 at 15:43
  • EXACTLY! It has to be the same thread. Your answer regarding simply creating an STA on a thread (STAThread or SetApartmentState()) could mislead people. It all depends on which thread created the object in the first place, and it's why your comment about STA for a thread-pool is effectively an oxymoron. – cirrus Oct 16 '12 at 17:31