-1

Which module would you recommend for thread safe HTTP client?
[ Alternative: event based multi-session HTTP client ]

Extra requirements:

  1. Support for retries (use of partially downloaded big files)
  2. Shared cookies
  3. Provided as Debian package
  4. Multi-session parallel download of big files [less important]
AnFi
  • 10,493
  • 3
  • 23
  • 47

2 Answers2

1

Don't use multiple threads; that will just slow things down and given you headaches. Instead, use an existing engine that can perform parallel requests.

For example,

Net::Curl::Multi and WWW::Curl::Mult provide access to libcurl, a proven, powerful and fast engine. (I've used the former in production.) You can still process the responses in different threads if you so desire.

AnyEvent::HTTP and AnyEvent::Curl::Multi are two other such engines. However, using these would add a lot of overhead (which may affect performance and robustness), and I don't know how well the various event loops deal with threaded environments.

If any of these modules don't have a Debian package, just create one!

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Is `WWW::Curl::Multi` "sufficiently similar"? [it is provided as Debian package] – AnFi Sep 18 '17 at 18:24
  • 1
    I have no experience with WWW::Curl, but it is also an interface to libcurl. Net::Curl came out of WWW::Curl, but the docs don't say why. Both have been updated since. – ikegami Sep 18 '17 at 18:29
  • huh? The second answer doesn't mention AnyEvent::Curl::Multi. And mine does (by virtue of mention you should use a module like AnyEvent::Curl::Multi). I've edited the question to specifically mention that module. I don't know why you'd want all the complexity of AnyEvent? – ikegami Sep 19 '17 at 17:39
  • I prefer to use perl modules provided as Debian packages. You answer is better **but** the other answer first fitted environment I use and prefer. [I have up-voted your answer] – AnFi Sep 19 '17 at 17:45
  • So you're saying you're going to use AnyEvent::HTTP? That's not what you said above. You said you were going to use AnyEvent::Curl::Multi. And you said that WWW::Curl::Multi is provided as a Debian package. So confused! – ikegami Sep 19 '17 at 17:46
  • **My earlier comment clarified**: I have decided to accept the second answer because AnyEvent::Curl::Multi **IS NOT** provided as Debian package. WWW::Curl::Multi seems "event loop like". I have chosen full event loop perl packages (AnyEvent+AnyEvent::HTTP) provided as Debian packages. – AnFi Sep 19 '17 at 17:55
  • Again, the second answer doesn't mention AnyEvent::Curl::Multi, and you've accepted my answer of not using one request per thread. – ikegami Sep 19 '17 at 17:59
  • By the way, it makes no sense to use a "full event loop" to just wait for a completely different event loop. Now, if you're going to use it instead of using threads for other things, that's fine. If so, you might want to throw Coro into the mix. It's a user-land co-operative multi-tasking system* that's compatible with AnyEvent. (* - This means it runs single-processor, and one "thread" can block all others for extensive periods of time.) Using Coro can also help make your code look more normal. – ikegami Sep 19 '17 at 18:04
0

AnyEvent::HTTP supports whatever event loop you would like to use.

Pascal
  • 267
  • 2
  • 10
  • Does `AnyEvent::HTTP` support "1." without extra coding? – AnFi Sep 18 '17 at 20:22
  • @Andrzej A. Filip, Not sure what you mean by that. Clients won't automatically retry, so you'll have to make a request for the missing range regardless of the client. – ikegami Sep 18 '17 at 20:29
  • @ikegami Is is simple/trivial to construct request for the missing part? [Command line curl offers command line option to download specific range] – AnFi Sep 18 '17 at 21:29
  • No matter what client you used, it's just a question of adding a header. You also need to check if the response is a complete response or just the requested range. – ikegami Sep 18 '17 at 21:32
  • 1
    See [HTTP range requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests). It uses `curl` in its examples, but nothing about the page is specific to `curl`. – ikegami Sep 18 '17 at 21:58