1

I've been struggling for the last couple of days with a bug I can't understand. This only happens in a Windows 2012 server (64bits), while it does not happen (at least) on the following windows versions: Windows XP (32 bits), Windows 7 (32 and 64bits), Windows 8 (64bits), Windows 8.1 (64 bits), Windows 10 (64 bits), Windows 2003 Server (32 bits).

Please note that the application in all cases is a 32 bit binary.

When the application starts it initializes some stuff, and in the end it creates a series of threads, all of them in suspended mode so that I can manually start them later.

In the following code, one would expect t to be started after the sleep. However, by the time I tried to explicitly call Start on the thread, I realized that it had already been started (somehow), leading to a thread already started exception (which would be ok considering that the thread had already been started).

// some initialization code is run before this, but nothing directly related
procedure foo;
var
  t : TThread; // note that there can be no reference to this thread from the outside since it's a local var
begin
  t := TMyThread.Create(true); // TMyThread

  sleep(3000); // time window of 3 seconds where the thread shouldn't have started

  t.Start; // now i want to start it manually, but the thread has already been started soon after it has been created
end;

The same happens also with regular (dummy) anonymous threads, so it wouldn't be TMyThread's fault:

// some initialization code is run before this, but nothing directly related
procedure foo;
var
  t : TThread; // note that there can be no reference to this thread from the outside since it's a local var
begin
  t := TThread.CreateAnonymousThread( // to avoid using my possibly faulted TMyThread class, I'm now testing with a regular anonymous thread and the problem persists
    procedure begin 
      log('Hi from anon thread');
      sleep(5000); // to make it live 5 seconds for testing purposes
    end); 

  sleep(3000); // time window of 3 seconds where the thread shouldn't have started

  t.Start; // now i want to start it manually, but the thread has already been started soon after it has been created
end;

Is there some mechanism present in Windows 2012 server regarding thread management that I'm missing?

LOG EXCERPTS:

The following is an excerpt from a healthy log file. You can see how the start method calls the startThreads method which in turns creates the first of the threads. The next lines represent the calls made from the CMP_AbstractThread's constructor. evInitialized and evStarted are TEvents created in CMP_AbstractThread's constructor.

12/03/2016 20:28:47.336: LOG_DEBUG  @ PID=5576 ThreadID=5188 (TExternalThread) @ CMPPS_Application.start: start ...
12/03/2016 20:28:47.336: LOG_DEBUG  @ PID=5576 ThreadID=5188 (TExternalThread) @ CMPPS_Application.createThreads: createThreads ...
12/03/2016 20:28:47.352: LOG_FINEST @ PID=5576 ThreadID=5188 (TExternalThread) @ CMP_AbstractThread.: (PID=5576 ThreadID=0 (CMPP_EventsThread)).Create ...
12/03/2016 20:28:47.352: LOG_FINEST @ PID=5576 ThreadID=5188 (TExternalThread) @ CMP_AbstractThread.: (PID=5576 ThreadID=6084 (CMPP_EventsThread)).Create.
12/03/2016 20:28:47.352: LOG_DEBUG  @ PID=5576 ThreadID=5188 (TExternalThread) @ CMP_AbstractThread.: (PID=5576 ThreadID=6084 (CMPP_EventsThread).evInitialized).Create ...
12/03/2016 20:28:47.352: LOG_DEBUG  @ PID=5576 ThreadID=5188 (TExternalThread) @ CMP_AbstractThread.: (PID=5576 ThreadID=6084 (CMPP_EventsThread).evInitialized).Create.
12/03/2016 20:28:47.352: LOG_DEBUG  @ PID=5576 ThreadID=5188 (TExternalThread) @ CMP_AbstractThread.: (PID=5576 ThreadID=6084 (CMPP_EventsThread).evInitialized).resetEvent ...
12/03/2016 20:28:47.352: LOG_DEBUG  @ PID=5576 ThreadID=5188 (TExternalThread) @ CMP_AbstractThread.: (PID=5576 ThreadID=6084 (CMPP_EventsThread).evInitialized).resetEvent.
12/03/2016 20:28:47.352: LOG_DEBUG  @ PID=5576 ThreadID=5188 (TExternalThread) @ CMP_AbstractThread.: (PID=5576 ThreadID=6084 (CMPP_EventsThread).evStarted).Create ...
12/03/2016 20:28:47.352: LOG_DEBUG  @ PID=5576 ThreadID=5188 (TExternalThread) @ CMP_AbstractThread.: (PID=5576 ThreadID=6084 (CMPP_EventsThread).evStarted).Create.
12/03/2016 20:28:47.352: LOG_DEBUG  @ PID=5576 ThreadID=5188 (TExternalThread) @ CMP_AbstractThread.: (PID=5576 ThreadID=6084 (CMPP_EventsThread).evStarted).resetEvent ...
12/03/2016 20:28:47.352: LOG_DEBUG  @ PID=5576 ThreadID=5188 (TExternalThread) @ CMP_AbstractThread.: (PID=5576 ThreadID=6084 (CMPP_EventsThread).evStarted).resetEvent.

The following is an excerpt from a Windows 2012 session. You can see that the thread starts even before its own constructor has finished. The log lines coming from the threads Execute method are coming from thread 5864, the one that is still being created. In this specific case, the anomaly didn't even wait for the constructor to finish (it usually does). It would seem as if the TThread was created with False instead of True.

12/03/2016 20:29:31.813: LOG_DEBUG  @ PID=3796 ThreadID=4460 (TExternalThread) @ CMPPS_Application.start: start ...
12/03/2016 20:29:31.813: LOG_DEBUG  @ PID=3796 ThreadID=4460 (TExternalThread) @ CMPPS_Application.createThreads: createThreads ...
12/03/2016 20:29:31.813: LOG_FINEST @ PID=3796 ThreadID=4460 (TExternalThread) @ CMP_AbstractThread.: (PID=3796 ThreadID=0 (CMPP_EventsThread)).Create ...
12/03/2016 20:29:31.813: LOG_FINEST @ PID=3796 ThreadID=4460 (TExternalThread) @ CMP_AbstractThread.: (PID=3796 ThreadID=5864 (CMPP_EventsThread)).Create.
12/03/2016 20:29:31.813: LOG_DEBUG  @ PID=3796 ThreadID=4460 (TExternalThread) @ CMP_AbstractThread.: (PID=3796 ThreadID=5864 (CMPP_EventsThread).evInitialized).Create ...
12/03/2016 20:29:31.813: LOG_DEBUG  @ PID=3796 ThreadID=4460 (TExternalThread) @ CMP_AbstractThread.: (PID=3796 ThreadID=5864 (CMPP_EventsThread).evInitialized).Create.
12/03/2016 20:29:31.813: LOG_DEBUG  @ PID=3796 ThreadID=4460 (TExternalThread) @ CMP_AbstractThread.: (PID=3796 ThreadID=5864 (CMPP_EventsThread).evInitialized).resetEvent ...
12/03/2016 20:29:31.813: LOG_DEBUG  @ PID=3796 ThreadID=4460 (TExternalThread) @ CMP_AbstractThread.: (PID=3796 ThreadID=5864 (CMPP_EventsThread).evInitialized).resetEvent.

12/03/2016 20:29:31.813: LOG_FINEST @ PID=3796 ThreadID=5864 (CMPP_EventsThread) @ CMP_AbstractThread.Execute: (PID=3796 ThreadID=5864 (CMPP_EventsThread)) Executed.
12/03/2016 20:29:31.813: LOG_NORMAL @ PID=3796 ThreadID=5864 (CMPP_EventsThread) @ CMP_AbstractThread.Execute: (PID=3796 ThreadID=5864 (CMPP_EventsThread)) Down.

12/03/2016 20:29:31.828: LOG_DEBUG  @ PID=3796 ThreadID=4460 (TExternalThread) @ CMP_AbstractThread.: (PID=3796 ThreadID=5864 (CMPP_EventsThread).evStarted).Create ...
12/03/2016 20:29:31.828: LOG_DEBUG  @ PID=3796 ThreadID=4460 (TExternalThread) @ CMP_AbstractThread.: (PID=3796 ThreadID=5864 (CMPP_EventsThread).evStarted).Create.
12/03/2016 20:29:31.828: LOG_DEBUG  @ PID=3796 ThreadID=4460 (TExternalThread) @ CMP_AbstractThread.: (PID=3796 ThreadID=5864 (CMPP_EventsThread).evStarted).resetEvent ...
12/03/2016 20:29:31.828: LOG_DEBUG  @ PID=3796 ThreadID=4460 (TExternalThread) @ CMP_AbstractThread.: (PID=3796 ThreadID=5864 (CMPP_EventsThread).evStarted).resetEvent.

CODE EXCERPTS:

constructor CMP_AbstractThread.Create(suspended : boolean);
begin
  inherited Create(suspended); // this one sets up threadName
  FEvInitialized := CMP_Event.Create(threadName, 'evInitialized');
  FEvInitialized.ResetEvent;
  FEvStarted := CMP_Event.Create(threadName, 'evStarted');
  FEvStarted.ResetEvent;
end;

Constructor CMP_Thread.Create(suspended : boolean);
Begin
  log(LOG_FINEST, format('(%s).Create ...', [ThreadInfo]));
  inherited Create(suspended);
  FThreadName := threadInfo;
  log(LOG_FINEST, format('(%s).Create.', [ThreadName]));
End;

procedure CMPPS_Application.createThreads;
begin
  logGuard('createThreads', procedure begin
    FEventsThread := CMPP_EventsThread(CMPP_EventsThread.Create(true)
      .setDbConnection(FDB)
      .setLogger(GEventsLogger)
      .setDelay(MP_EVENTS_THREAD_DELAY));
  end);
end;

CMPP_EventsThread = class(CMPC_EventsThread)
  // all the following methods are inherited from different layers
  // Create(boolean) is inherited directly from CMP_AbstractThread
  // setDbConnection(IMP_DBConnection)
  // setLogger(IMP_Logger)
  // setDelay(integer)
end;
Nakilon
  • 34,866
  • 14
  • 107
  • 142
drakorg
  • 404
  • 4
  • 14
  • 3
    This seems pretty implausible. Can we have a complete program? – David Heffernan Mar 11 '16 at 22:42
  • I was afraid you'll ask for that. Sure, but it will take me a little time to extract a running demo. – drakorg Mar 11 '16 at 23:19
  • As I suspected, the demo appears to be running well. So, the problem must be in the code before the thread creation. But how can that code, which knows nothing about my thread (it's even a local variable), do anything to make it start on creation? It would look like something happens that makes future threads to start automatically, no matter that they are being created suspended. Any idea of such an scenario where the init code (represented by the comment line above) could make such a thing happen? Remember that this only happens on 2012 server. – drakorg Mar 12 '16 at 01:33
  • I think it likely that the defect is in your code. I'd concentrate attention there. – David Heffernan Mar 12 '16 at 07:20
  • Cut down your code until it stops exposing the problem, shouldn't be too hard? – whosrdaddy Mar 12 '16 at 13:16
  • @DavidHeffernan I'd put my money there, too. But I'm still dazzled by the kind of anomaly, and that it only happens on this windows version. – drakorg Mar 12 '16 at 15:56
  • @whosrdaddy Yes, that's what I've been doing so far to no avail yet. The fact that each compilation/deployment cycle takes around 20 minutes, doesn't help much either. – drakorg Mar 12 '16 at 16:05
  • 1
    Maybe a memory overwrite error. This could happen (or manifest itself) only on a certain hardware/OS combination. Have you tried with all runtime error detection on, Overflow and Range check? – Tom Brunberg Mar 12 '16 at 20:39

1 Answers1

5

What you are describing is fairly impossible under normal circumstances. If the ACreateSuspended parameter of the TThread constructor is True, the thread is created in a suspended state. No ifs, ands, or buts about that. Suspended threads cannot spontaneously start themselves.

Anonymous threads are always created suspended.

Assuming you are calling TThread.Start() on a valid TThread object, Start() raises an exception only if either:

  1. the FCreateSuspended member is False. This member is set to True in the TThread constructor if the ACreateSuspended parameter is True, and is set to False by Start(). You can call Start() only once.

  2. the FFinished member is True. This is set to True after TThread.Execute() and TThread.DoTerminate() have been called, meaning the thread had been running to completion. You cannot call Start() on a terminated thread.

  3. the FExternalThread member is True. This is set for the TThread object returned by TThread.GetCurrentThread(). You cannot call Start() on a thread that you did not explicitly Create().

  4. the above conditions are OK, but the underlying OS thread (represented by the TThread.Handle property) is not successfully transitioned from a suspended state to a running state. The only way that can happen is if something in your code is calling TThread.Suspend() or TThread.Resume() (or worse, something is calling the Win32 API SuspendThread() or ResumeThread() directly) to alter the thread's suspension count before you are calling TThread.Start().

Given the example code you have shown, these conditions are impossible. So the problem has to be related to code you have not shown.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • 1) +1 for the precise description of the raising of the exception. 2) I'm not worried about the exception being raised, because it is coherent with the state of things (the thread has already started). In this case, we are clearly in the first case you mentioned 3) If you had to write code that runs before my snippet to force this anomaly, what would you do? 4) We've been using this application for almost 8 years, non-stop, 7x24x365 on at least 30 installations, and we've never faced this problem. Why could it only happening on Windows 2012 server (64 bit)? – drakorg Mar 12 '16 at 04:07
  • Digging into point 3 of my comment, and quoting your "under normal circumstances": what do you think could be going on on the context for a thing like this to happen? When I saw that the same thing kept happening even on locally referenced "dummy" anonymous threads I threw in the towel. I fail to see anything that I could be doing to produce such anomaly. – drakorg Mar 12 '16 at 04:23
  • I can think of a few possibilities, but I would not want to speculate without more information. If you go into the Project Options and enable Debug DCUs, you can step into the source code for `Start()` and see exactly what condition it is failing on. I *suspect* the most likely culprit will be `ResumeThread()` returning a value other than 1 (thread transitioning from suspended to running). – Remy Lebeau Mar 12 '16 at 17:27
  • It will be hard for me for the moment to setup a debug session on the server. Despite of that, It's clear to me that the condition is 1) in your list. I have no doubt of that because I can actually see that the threads are running (they are producing log entries, doing what they are supposed to). I'm not so interested in knowing why my `Start()` is failing, as much as in knowing why the threads started. Regarding your reluctance to speculate, please feel free to. No harm will be done. – drakorg Mar 12 '16 at 23:25
  • As I described in my answer, there are several different ways that `Start()` can fail. And the fact that it is failing in your project means something is mishandling your threads behind your back. At least by stepping into the source code for `Start()`, you can see why. But, the fact that your created-as-suspended threads are running means that either something is hooking `CreateThread()` itself to disable the `CREATE_SUSPENDED` flag, or something is calling `TThread.Resume()`/`ResumeThread()` prematurely. Really hard to know what is happening since you don't have a test case to reproduce it. – Remy Lebeau Mar 12 '16 at 23:37
  • I added some log and code excerpts at the end, after the Thanks. It's not a test case, but it may be of some aid. – drakorg Mar 13 '16 at 05:39