-2

I am having a problem where a multithreaded download application that we wrote is sometimes getting 408 error . We think it might be because the user has increased the number of threads ,and this has caused the error. We think it may be possible that during the network call, context switch happens and that results in thread not sending all the packets required for the call to succeed and sever gives a 408 timeout error. Is that possible or is it that network calls are not dependent with context switches.

We are using python threads and pycurl module to download the data using 120 threads .

Amrit
  • 5
  • 1
  • Python context switching doesn't affect system calls. More likely the time *between* the network calls is excessive due to too many threads and too much CPU load. – user207421 May 12 '21 at 07:48
  • Okay ,so if CS don't affect the network calls then having n=120 threads should not be a problem. I don't understand what you mean by time between network calls ? – Amrit May 12 '21 at 07:59
  • I am sorry,lets go over it again. The system calls are not really impacted (including the network calls according to you). If that is the case ,then what do you mean by time between network calls,as that statement a oxymoron . Too much cpu load cannot be caused by 120 threads running on a system with 64 cores ,128gb mem. The program uses less then .2% of cpu and 300mb of memory. – Amrit May 12 '21 at 11:08
  • "*We think it may be possible that during the network call, context switch happens and that results in thread not sending all the packets required for the call to succeed and sever gives a 408 timeout error.*" Can you give us some idea why you think that? What's the basis for the belief that a context switch can cause incomplete outbound data? (Is it because you know your code makes bogus assumptions about how TCP data will be packetized? Because, if so, the problem is the assumptions in the code, not the context switches!) – David Schwartz May 12 '21 at 23:56
  • As i said,i am not sure why this is happening . We have no control over server to find out why the server is sending 408 . It was one of the thought process that since we have a lot of threads ,and if due to GIL (Global Interpreter lock) only one thread is working at a time ,we thought that maybe CS was the problem but as you pointed out ,the CS has no effect on the N/W calls. We also used a packet sniffer to make sure all our outbound calls are going with complete data . Maybe this is a scalability issue on the server we are calling. Thanks for the support. – Amrit May 13 '21 at 12:11

1 Answers1

0

What context switch are you referring to?

Pycurl is a wrapper around libcurl which is a C library. When any of libcurl's methods are called, Python's global interpreter lock is released. Python runtime may run its own threads while libcurl is performing client-side processing, or during actual network I/O, but "context switches" aren't attached to network I/O in any way.

This shouldn't affect your troubleshooting however because you should be using a packet capture tool like tcpdump to record your traffic and identify which side is sending what when you are getting the errors so that you can make some reasoned theories as to what might be going on instead of guessing.

D. SM
  • 13,584
  • 3
  • 12
  • 21
  • Thanks for the help. We found out that this was not an issue on our side of things and that maybe server has some issue where a lot of calls have some adverse affect on performance. – Amrit May 13 '21 at 12:13