I'm looking at donne martin's design for a web crawler.
they are suggesting the following network optimization:
The Crawler Service can improve performance and reduce memory usage by keeping many open connections at a time, referred to as connection pooling
Switching to UDP could also boost performance
I don't understand both suggestions: what's connection pooling got to do with web crawling? isn't each crawler service opening its own connection to the host its currently crawling? what good would connection pooling do here? and about UDP - isn't crawling issuing a HTTP over TCP requests to web hosts? how is UDP relevant here?