Scalable Delphi TCP server implementation

Question

Any suggestions for components to use as a base for a scalable TCP server? I currently have an implementation that uses Indy which works well for say 100 relatively active connections or 1,000 relatively inactive connections, but the one thread per connection model limits the number of concurrent active connections that can be handled.

Let's say my goal might be 1,000 connections each processing 10 messages per second or 10,000 connections each processing 1 message per second on a good server (8-16 cores). Is this realistic? I'd really like to hear of any real-world implementations because I have found that what might work in theory does not necessarily work in practice and I do not want to be chasing a proposed solution that will not work.

Edit: IOCP would be good, but I only want to use commercial-grade classes/components, so they need to be as "professional" as Indy or IP*Works before I would think of using them. Furthermore, I have no intention of "rolling my own" solution - it would take too much time to make it commercial-grade. Lastly, I am looking for a significant improvement on what I already have. I am sure I can squeeze at least 20-50% more out of what I have (based on Indy), but I am never going to be able to handle 10,000 concurrent clients, or 10,000 messages per second, no matter how hard I try. Whether there is something out there that meets these conditions is another matter.

I have decided to accept the answer referring to the IOCP classes, even though I have not used them, because they look like the best path for investigation at this stage.

Some pointers to IOCP solutions for Delphi can be found here: http://stackoverflow.com/questions/1072510/delphi-tclientsocket-replacement-using-winsock2-and-iocp and here http://stackoverflow.com/questions/2302267/is-there-a-i-o-completion-port-based-component-for-delphi — mjn, Aug 23 '11 at 06:57
See http://stackoverflow.com/questions/2302267/is-there-a-i-o-completion-port-based-component-for-delphi - it will give you the best performance for multiple simultaneous connections. — Arnaud Bouchez, Aug 23 '11 at 07:02
@mjn, Arnaud, there is nothing in those links on IOCP beyond an "I'll try it and see". Certainly nothing to even suggest there are commercial-grade offerings in this area. — Misha, Aug 23 '11 at 07:52
I know this is a really old post, but i am myself in search of a scalable solution, i am using indy TCP components but it is failing even when there are around 400+ over active connections. what did you end up with Misha? i will really appreciate a hint or more than a hint :) — Junaid Noor, Aug 01 '14 at 06:55

score 3 · Accepted Answer · answered Aug 23 '11 at 09:09

There is a project at http://voipobjects.com/ which is based on the former iopcclasses project.

It claims to handle thousands simultaneous connections:

IOCP engine is set of classes, components and routines for rapid creation high scalable and performance TCP/UDP applications. Application created using IOCP classes can handle thousands simultaneous connections.

Library is written in Delphi - Delphi 7 - 2010 are supported.

Library uses IO completion ports technology. There is most powerful technology in Win32 world for creation highly scalable and performance TCP/UDP applications. This technology is supported in all desktop Windows OSes except old Win9x/WinME versions.

This library is licensed under MPL1.1. Also It includes some files from Jedi project (Winsock2 header translation).

https://bitbucket.org/voipobjects/iocpengine

This looks far and away the most promising. Currently active (relatively recent bug-fixes) with a significant code base. I will definitely check this out. — Misha, Aug 23 '11 at 10:57

score 2 · Answer 2 · answered Aug 23 '11 at 00:39

2

My favorite Delphi network layer is ICS by Francois Piette. It's fantastically easy to understand, very scalable, and ultra-high performance. Free, and open source. Will probably scale to 1000 clients for most people, without significant effort, and without the complexity that gives me trouble when I use Indy.

I got about a 20% scalability/performance boost from switching all my stuff from Indy to ICS.

answered Aug 23 '11 at 00:39

Warren P

65,725
40
181
316

I will check out ICS as well. – Misha Aug 23 '11 at 00:59
I was not aware that ICS uses I/O completion ports. Can you confirm this? BTW, Indy can be used non-blocking, so it is possible with a fair bit of work to share connections across threads. However, I think that this might be a significant piece of work. – Misha Aug 23 '11 at 06:15
@Misha I just deleted my comment. I'm not sure ICS is not blocking, on the contrary. I made a confusion between ICS and IOCP. – Arnaud Bouchez Aug 23 '11 at 06:59
1

I also like ICS. But be aware it uses windows messages *alot*, and it's very sensitive to the amount of time it takes *you* to process those messages. If the 1000 simultaneous connections do heavy I/O traffic, ICS might fail. – Cosmin Prund Aug 23 '11 at 07:06
1

@Cosmin, I want to avoid a Windows message-based solution so that would rule out ICS. – Misha Aug 23 '11 at 07:55

score 1 · Answer 3 · answered Aug 22 '11 at 17:10

1

You should look at RealThinClient SDK http://www.realthinclient.com/about.htm

Well proven solution. Good support. Test results for different server solutions on the home page.

answered Aug 22 '11 at 17:10

LU RD

34,438
5
88
296

I really need the flexibility of a TCP-based solution (a lot of stuff I do pushes unsolicited notifications from server to client) over an HTTP one, although this is still worth investigating. – Misha Aug 22 '11 at 22:03

score 1 · Answer 4 · edited May 23 '17 at 10:29

1

The real deciding fact is what you plan on doing on each of those transactions.

I use Indy with Network Load Balanced windows servers. One of these Delphi applications is serviced by 3 physical servers listening one public IP address where we have received millions of requests since yesterday with zero errors. Load overnight is pretty idle so the actual requests are around 350/second/server during the day and there's plenty of room for growth.

If there's not a lot of CPU/Memory needed per transaction you might get away with it on one box using Indy. It all depends on the load...as you likely can't write to 1000 different files every second.

There's other items to worry about too - like the OS supporting this amount of activity. You may need to tweak some registry settings. (see this stackoverflow question)

IOCP is the way to go for ultra-capacity servers. I have used Indy for ease of use in implementation/debugging for a very long time. I have my own IOCP implementation that I wrote years ago but never rolled it out on production as we simply haven't needed to.

My simple advice - I'd highly suggest rolling it out in Indy, using NLB as your crutch for load, and after that if you are still desiring the utmost speed, write your own IOCP implementation so you can craft it towards your specific requirements. Note that this is based on knowing nothing of the actual implementation requirements.

edited May 23 '17 at 10:29

Community

1
1

answered Aug 22 '11 at 19:08

Darian Miller

7,808
3
43
62

I have deployed in-production solutions based on Indy for the last 3 years. I like it and it has worked for me so far, with systems with 100-200 concurrent clients with bursty activity handling the load OK. There are lots of areas where I can tweak for performance, but the fundamental limit I have at the moment would be a high number of active connections causing too much context switching. I can probably get 500-1000 requests/second from a single decent server, but beyond this the one thread per connection bites. See commments below on IOCP. – Misha Aug 22 '11 at 22:20
indy is fine with 100-200 concurrent clients. But not with 1000, at least not for me. – Warren P Aug 23 '11 at 00:37
@Warren P, totally agree, because 1,000 clients means 1,000 server connection threads, and if they are all active, then that is a lot of context swtching. – Misha Aug 23 '11 at 00:55
Mine reach 8-900 active connections at times but if it's 1000 all the time, then the quick and easy solution is 2 servers at 500 each. You can setup the NLB cluster so the affinity is to a particular server (to keep a client connected to the same server) Bottom line - if you have 1000 transactions active, the piece that will get you is what you are trying to do with those 1000 transactions at the same time on the same server. Delphi is typically not good at taking advantage of a lot of cores. – Darian Miller Aug 23 '11 at 03:54
I have the actual data processing covered - I have a threading framework that will allow me to make complete use of as many cores as I have. And if the processing is stateless (sometimes possible) I can parallel multiple data processing threads just by updating the config file. It is the client connections that limit me. – Misha Aug 23 '11 at 06:11
Not just context switching; Indy resource and CPU-intensive logic is far too heavy weight for my purposes. I don't think you could speed Indy up by adding more code to it; A radical simplification is needed. – Warren P Aug 24 '11 at 02:24
@Warren P, not so in the case of the server connection threads, which is where the problem lies. Beware premature optimisation! I have an enormous amount of code around the Indy TCP implementation (over 10K LOC) and large overheads in my framework messaging, and for a local machine client/server connection I can still get 30Mpbs throughput (actually benchmarked). Indy is good for well over 100Mbps and the processing overhead for an idling connection thread is absolutely miniscule. If you step through the Indy TCP server code (I have) you can confirm this. – Misha Aug 24 '11 at 03:43
Further to my comment above (I ran out of space), I just don't buy into this "I need a lightweight wrapper" argument. I have an absolutely "heavyweight" framework architecture, with limited performance tweaks at this time, and actual benchmarks prove it still capable of high data throughput (30Mbps+) and high messaging throughput (2,000+/second, that's around 200 million/day) on a single decent server. Like on this occasion, it is usually a larger architectural issue that are the ultimate performance bottlenecks for a well architected system, not an issue with low-level code. – Misha Aug 24 '11 at 03:56

score 0 · Answer 5 · answered Aug 22 '11 at 21:52

I've tried multiple Delphi solutions to do networking, and found that many if not all solutions add complexity and code which impacts on either performance or footprint or both. So I started searching for the lightest wrapper around the winsock API. The I (re)discovered Delphi's own TTcpClient and TTcpServer components. Used in blocking mode, and overriding TCustomTcpServer overriding the DoAccept method, I've had the best results till now.

If you expect to have a really high number of incoming connections and (small) responses to serve to (small) requests, it's highly advisable to implement I/O completion ports, as this handles incoming requests better.

Yes, my solution adds a lot of complexity over Indy (probably about 10,000 lines of code), but it is well managed and rock-solid. I agree that I/O completion ports is probably the way to go, but I want to use a higher level of abstraction than building something myself. — Misha, Aug 22 '11 at 22:06

score 0 · Answer 6 · answered Aug 23 '11 at 15:53

I have been using ICS for the last 12 years. It is non-blocking. I support upto 2000 concurrent connections, each getting atleast 5000 bytes per second, sent as 1000 bytes every 200 msec. Never faced any problem and cpu usage of the app is very small. Good support in forum, but not required at all.

Shekar

Scalable Delphi TCP server implementation

6 Answers6

Linked