0

I’m writing a simple Windows TCP/IP server application, which only needs to communicate with one client at a time. My application has four threads:

  1. Main program which also handles transmission of data as needed.
  2. Receive incoming data thread.
  3. Listen thread to accept connection requests from the client.
  4. A ping thread which monitors everything else, and transmits heartbeat messages as needed. I realise that the latter shouldn’t really be necessary with TCP/IP, but the client application (over which I have no control) requires this.

I’ve confirmed in task manager that my application does indeed have four threads running.

I’m using blocking TCP/IP sockets, but my understanding is that they only block the calling thread – the other threads should still be allowed to execute without being blocked. However, I have encountered the following issues:

  1. If the ping thread deems the connection to have died, it calls closesocket(). However, this appears to be being blocked by the call to recv() in the receive thread.

  2. The main application is unable to transmit data while the receive thread has a call to recv() in progress.

The socket is being created via the accept() function. At this stage I’m not setting any socket options.

I've now created a simple two thread program which illustrates the problem. Without the WSA_FLAG_OVERLAPPED flag, the second thread gets blocked by the first thread, even though this would appear to be contrary to what is supposed to happen. If the WSA_FLAG_OVERLAPPED flag is set, then everything works as I would expect.

PROJECT SOURCE FILE:
====================

program Blocking;

uses
  Forms,
  Blocking_Test in 'Blocking_Test.pas' {Form1},
  Close_Test in 'Close_Test.pas';

{$R *.res}

begin
  Application.Initialize;
  Application.CreateForm(TForm1, Form1);
  Application.Run;
end. { Blocking }

UNIT 1 SOURCE FILE:
===================

unit Blocking_Test;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, WinSock2;

type
  TForm1 = class(TForm)
    procedure FormShow(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;
  Test_Socket: TSocket;
  Test_Addr: TSockAddr;
  wsda: TWSADATA; { used to store info returned from WSAStartup }

implementation

{$R *.dfm}

uses
  Debugger, Close_Test;

procedure TForm1.FormShow(Sender: TObject);
const
  Test_Port: word = 3804;
var
  Buffer: array [0..127] of byte;
  Bytes_Read: integer;
begin { TForm1.FormShow }
  Debug('Main thread started');
  assert(WSAStartup(MAKEWORD(2,2), wsda) = 0); { WinSock load version 2.2 }
  Test_Socket := WSASocket(AF_INET, SOCK_DGRAM, IPPROTO_UDP, nil, 0, 0{WSA_FLAG_OVERLAPPED});
  assert(Test_Socket <> INVALID_SOCKET);
  with Test_Addr do
  begin
    sin_family := AF_INET;
    sin_port := htons(Test_Port);
    sin_addr.s_addr := 0; { this will be filled in by bind }
  end; { with This_PC_Address }
  assert(bind(Test_Socket, @Test_Addr, SizeOf(Test_Addr)) = 0);
  Close_Thread := TClose_Thread.Create(false); { start thread immediately }
  Debug('B4 Rx');
  Bytes_Read := recv(Test_Socket, Buffer, SizeOf(Buffer), 0);
  Debug('After Rx');
end; { TForm1.FormShow }

end. { Blocking_Test }

UNIT 2 SOURCE FILE:
===================

unit Close_Test;

interface

uses
  Classes;

type
  TClose_Thread = class(TThread)
  protected
    procedure Execute; override;
  end; { TClose_Thread }

var
  Close_Thread: TClose_Thread;

implementation

uses
  Blocking_Test, Debugger, Windows, WinSock2;

type
  TThreadNameInfo = record
    FType: LongWord;     // must be 0x1000
    FName: PChar;        // pointer to name (in user address space)
    FThreadID: LongWord; // thread ID (-1 indicates caller thread)
    FFlags: LongWord;    // reserved for future use, must be zero
  end; { TThreadNameInfo }

var
  ThreadNameInfo: TThreadNameInfo;

procedure TClose_Thread.Execute;

  procedure SetName;
  begin { SetName }
    ThreadNameInfo.FType := $1000;
    ThreadNameInfo.FName := 'Ping_Thread';
    ThreadNameInfo.FThreadID := $FFFFFFFF;
    ThreadNameInfo.FFlags := 0;
    try
      RaiseException( $406D1388, 0, sizeof(ThreadNameInfo) div sizeof(LongWord), @ThreadNameInfo );
    except
    end; { try }
  end; { SetName }

begin { TClose_Thread.Execute }
  Debug('Close thread started');
  SetName;
  sleep(10000); { wait 10 seconds }
  Debug('B4 Close');
  closesocket(Test_Socket);
  Debug('After Close');
end; { TClose_Thread.Execute }

end. { Close_Test }

P.S. Since setting the WSA_FLAG_OVERLAPPED attribute has fixed the problem, I've posted the above for academic interest.

  • Can you please put the source code? `recv` should block only a thread where it resides. Otherwise, one connected client could block another client in a separate thread by issuing slow requests/responses. – karastojko Jan 02 '19 at 07:38
  • OK, but it'll take a little while for me to make something small enough which just contains the relevant sections. – Chris Hubbard Jan 02 '19 at 08:09
  • I’m still working on creating a small enough test program to list here, but in the meantime another thought has occurred to me: Can threads “own” other threads, and in so doing cause one thread to block another? My application is written in Delphi, and I’m using it’s standard TThread component. – Chris Hubbard Jan 02 '19 at 10:13
  • @MartynA on Windows, closing a socket indeed cancels blocking operations in progress in other threads. – Remy Lebeau Jan 02 '19 at 17:25
  • @ChrisHubbard are you, by chance, using your own critical section or mutex around blocking socket operations? Without seeing your actual code, that is the only way I can think of for this to happen. By themselves, blocking sockets do not behave the way you have described. – Remy Lebeau Jan 02 '19 at 17:34
  • @ChrisHubbard Also, having the main thread and the ping thread both transmit data can cause race conditions that can corrupt your messaging, if you don't sync access to the sockets. Best to only send from 1 thread. For instance, have the ping thread signal the main thread to send on its behalf. Or simply get rid of the ping thread and let the receive thread also handle sends for that client. – Remy Lebeau Jan 02 '19 at 17:39

2 Answers2

2

If the ping thread deems the connection to have died, it calls closesocket(). However, this appears to be being blocked by the call to recv() in the receive thread.

That's just a bug in your code. You cannot free a resource in one thread while another thread is, or might be, using it. You will have to arrange some sane way to ensure that you don't create race conditions around access to the socket.

To be clear, there is no way you can know what that kind of code could possibly do. For example, consider:

  1. The thread actually hasn't called recv yet, it's about to call recv but the scheduler hasn't got around to it yet.
  2. The other thread calls closesocket.
  3. A thread that is part of a system library opens a new socket and happens to get the same socket descriptor you just closed.
  4. Your thread now gets to call recv, only it's receiving on the socket the library opened!

It is your responsibility to avoid these kinds of race conditions or your code will behave unpredictably. There's no way you can know what the consequence of performing random operations on random sockets could be. So you must not release a resource in one thread while another thread is, might be, or (worst of all) might be about to be, using it.

Most likely what's actually happening is that Delphi has some kind of internal synchronization that is trying to save you from disaster by blocking the thread that can't safely make forward progress.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • Thank you for your answer. In the case of the Ping thread, I could eliminate the need for it to close the socket (and therefore eliminate the potential race condition) provided I was able to add a receive timeout to the socket. So far my attempts to do this have been unsuccessful, hence why I have tried to add this function to the Ping thread. This then raises a new question: how does one add a receive timeout to a blocking socket? As per my original post, the socket is being created by the accept() function. – Chris Hubbard Jan 02 '19 at 11:04
  • 3
    If you have a new question to ask, ask a new question – David Heffernan Jan 02 '19 at 11:32
  • @ChrisHubbard You could just have the ping thread shut the connection down without closing the socket. – David Schwartz Jan 02 '19 at 11:35
  • So it’s OK for the Ping thread to call shutdown() but not closesocket()? I was under the impression that closesocket() called shutdown() anyway. BTW, in this situation there is no need for an orderly shutdown. A hard shutdown is fine. – Chris Hubbard Jan 02 '19 at 12:55
  • 1
    Note that Windows does not use file descriptors for sockets, unlike how some other platforms do. Sockets are true kernel objects, so it is unlikely for a new socket to get the same value as a recently closed socket. It is also why you can't use non-socket descriptors with `select()`, and why `closesocket()` is needed instead of `close(). – Remy Lebeau Jan 02 '19 at 17:30
  • @ChrisHubbard you can use `setsockopt(SO_RCVTIMEO)` to set a timeout for blocking receive calls. Or use `select()`. But, if you switch to using non-blocking sockets, you can eliminate all of your secondary threads and do everything in your main thread while still being able to service the UI. That is what Winsock's non-blocking feature was originally designed for, back when Windows didn't even support multiple threads yet. – Remy Lebeau Jan 02 '19 at 17:40
  • I have already tried using setsockopt(SO_RCVTIMEO), but it’s had no effect. According to the MSDN documentation: “If the socket is created using the WSASocket function, then the dwFlags parameter must have the WSA_FLAG_OVERLAPPED attribute set for the timeout to function properly. Otherwise the timeout never takes effect.” In my case the socket is being created by the accept() function, so this attribute doesn’t exist. – Chris Hubbard Jan 03 '19 at 00:17
  • @ChrisHubbard It's okay to call `shutdown` because that doesn't attempt to release a resource while another thread might be accessing it. The problem with `closesocket` is that it renders the handle the other thread is (or may be about to) use meaningless or, worse, of unknown meaning. There's a huge race condition with `closesocket` based on exactly when it happens relative to other socket operations. There's no similar problem with `shutdown`. – David Schwartz Jan 03 '19 at 00:31
1

UPDATE: accept() creates the new socket with the same attributes as the socket used for listening. Since I hadn’t set the WSA_FLAG_OVERLAPPED attribute for the listen socket, this attribute wasn’t being set for the new socket, and options like the receive timeout didn’t do anything.

Setting the WSA_FLAG_OVERLAPPED attribute for the listen socket seems to have fixed the problem. Thus I can now use the receive timeout, and the Ping thread no longer needs to close the socket if no data has been received.

Setting the WSA_FLAG_OVERLAPPED attribute for the listen socket also seems to have addressed the blocking other threads issue.

  • note that the [`WSASocket()`](https://learn.microsoft.com/en-us/windows/desktop/api/winsock2/nf-winsock2-wsasocketw) documentation states: "*By default, a socket created with the `WSASocket()` function will not have this overlapped attribute set. **In contrast, the `socket()` function creates a socket that supports overlapped I/O operations as the default behavior**.*" So, if you use `socket()` to create the listening socket, then `accept()` will inherit the overlapped attribute. – Remy Lebeau Jan 03 '19 at 03:19