0

I've written a scraping application which pulls a large amount of pages from a site and parses them. This works well in Windows and is able to pull them pages fast. However, using mono on Linux, the time needed to pull the connection is really slow. I've found if I write urls to a file I can fire up a wget process to pull the pages in bulk then parse the files, but when needing cookies,other headers and per-page processing before getting the next page, using wget is impractical.

I've done a long search and the closest I've come to the problem is here but that still doesn't offer a solution for linux.

I understand there are different routes, but this is unimportant as wget can pull the files at blistering speed, whereas webclient / httpclient cannot.

What can I do to try to solve this bizarre and unexpected problem?

user3791372
  • 4,445
  • 6
  • 44
  • 78
  • what version of Mono are you using? – knocte Feb 14 '17 at 09:57
  • 4.6.2 i wrote a series of tests here: http://stackoverflow.com/questions/42221723/mono-and-webrequest-speeds-a-test – user3791372 Feb 14 '17 at 10:08
  • I would recommend you to do 3 things: 1. test with Mono 4.8.x. If it's still slow, then: 2. test with Mono master branch (I believe some it contains some threadpool improvements). If it's still slow then: 3. file a bug report in http://bugzilla.xamarin.com/ including your testcases. – knocte Feb 15 '17 at 02:27

0 Answers0