3

I'm trying to write a simple c# application which downloads a large number of small files from an FTP server.

I've tried two approaches:

1 - generic socket programming

2 - using FtpWebRequest and FtpWebResponse objects

The download speed (for the same file) when using the first approach varies from 1.5s to 7s, the 2nd gives more less the same results - about 2.5s each time.

Considering that about 1.4s out of those 2.5s takes the process of initiating the FtpWebRequest object (only 1.1s for receiving data) the difference is quite significant.

The question is how to achieve for the 1st approach the same good stable download speed as for the 2nd one?

For the 1st approach the problem seems to lay in the loop below (as it takes about 90% of the download time):

Int32 intResponseLength = dataSocket.Receive(buffer, intBufferSize, SocketFlags.None);    
while (intResponseLength != 0)  
{  
  localFile.Write(buffer, 0, intResponseLength);  
  intResponseLength = dataSocket.Receive(buffer, intBufferSize, SocketFlags.None);  
}

Equivalent part of code for the 2nd approach (always takes about 1.1s for particular file):

Int32 intResponseLength = ftpStream.Read(buffer, 0, intBufferSize);  
while (intResponseLength != 0)  
{  
  localFile.Write (buffer, 0, intResponseLength);  
  intResponseLength = ftpStream.Read(buffer, 0, intBufferSize);  
}  

I've tried buffers from 56b to 32kB - no significant difference.
Also creating a stream on the open data socket:

Stream str = new NetworkStream(dataSocket);  

and reading it (instead of using dataSocket.Receive)

str.Read(buffer, 0, intBufferSize);

doesn't help... in fact it's even slower.

Thanks in advance for any suggestion!

user587356
  • 31
  • 2

1 Answers1

2

You need to use Socket.Poll or Socket.Select methods to check availability of data. What you do not only slows down operation, but also causes extensive CPU load. Poll or Select will yield processor time until data is available or timeout elapses. You can keep the same loop but include a call to one of the above methods, and play with timeouts (try values from 10 ms to 500 ms to find timeout, optimal for your task).

Eugene Mayevski 'Callback
  • 45,135
  • 8
  • 71
  • 121
  • The excessive CPU load can be more or less eliminated with use of an explicit `Thread.Sleep` to make the busy-loop far less ... busy -- at least to whatever the minimum sleep frequency is on the platform (<< 1000Hz), and perhaps less busy than that even. Not that I'm advocating against another method, but it's something to note. –  Jan 24 '11 at 21:40
  • @pst Sleep will slow down transfer - what is the reason for sleeping if you have data in the buffer, waiting for processing? – Eugene Mayevski 'Callback Jan 24 '11 at 21:44
  • Eugene, Thanks for your answer. However it didn't work for me ;-( I modified my code as below (maybe a little bit to excessively...?): – user587356 Jan 25 '11 at 12:26
  • The code: if (dataSocket.Poll(500000, SelectMode.SelectRead)) //0.5s { Int32 intResponseLength = dataSocket.Receive(buffer, intBufferSize, SocketFlags.None); while (intResponseLength != 0) { localFile.Write(buffer, 0, intResponseLength); if (dataSocket.Poll(500000, SelectMode.SelectRead)) { intResponseLength = dataSocket.Receive(buffer, intBufferSize, SocketFlags.None); } } } – user587356 Jan 25 '11 at 12:27
  • The time needed to download given file stays the same. Am I doing it in a wrong way? Is there anything else you could suggest? Should I use asynchronous approach? Cheers! – user587356 Jan 25 '11 at 12:28
  • @user587356 your modified code is not correct, moreover it will produce wrong data if Poll returned false (you'll be saving the same buffer twice or more times. – Eugene Mayevski 'Callback Jan 25 '11 at 15:29
  • Yep - have noticed that. Suddenly a 200k file became to be over 3M. Oddly enough the download time was still the same. I just wanted to be sure I used the poll method in every possible way to increase performance. So it should be placed before entering the loop. But as I said the problem stays the same. Can you suggest other possibilities if there are any? – user587356 Jan 26 '11 at 05:49
  • @user587356 No, it shouldn't. It should be called before each call to Receive – Eugene Mayevski 'Callback Jan 26 '11 at 08:26
  • OK, I put dataSocket.Poll before every receive calling, if there is a timeout I'm throwing an exception. This works fine - but is still "randomly" slow... – user587356 Jan 26 '11 at 12:08
  • @user587356 You don't need to throw an exception. If timeout expires, you just continue the loop (but of course don't save any data as you haven't received it) unless you know that "large timeout" (30-60 seconds) is over. – Eugene Mayevski 'Callback Jan 26 '11 at 12:16
  • It's done finally ;-) Thank you for your help with using the Poll method. It still works slowly though... Should I change something else? – user587356 Jan 26 '11 at 13:57
  • @user587356 further investigation requires analysis of your complete project, which is beyond StackOverflow capabilities (and goal). – Eugene Mayevski 'Callback Jan 26 '11 at 14:07