I have a file with million urls like: the data file is like:
http://wonderland.cjfallon.ie/
http://www.youtube.com/
http://www.starfall.com/
http://education.scholastic.co.uk/
http://www.scoilnet.ie/
http://www.nessy.com/
http://www.senteacher.org/
http://scoop.it/
http://www.moviemaker.com/
http://learni.st/
http://www.twitter.com/
http://www.facebook.com/
http://www.gutenberg.org/
http://www.gutenberg.org/cache/epub/42361/pg42361.txt
I want to crawl them,so the bound is network IO,so I want to use multiple threads or gevent to tackle it.
my multiple threads code works well in : https://gist.github.com/young001/5449751
but when using gevent, the code is : https://gist.github.com/young001/baa3eebbf7342c5ac077 it always goes wrong:
status is 200
status is 200
Internal error in evhttp
the url is down http://web2.socialcomputingmagazine.com/the_social_graph_issues_and_strategies_in_2008.htm
the reason
status is 200
status is 200
status is 200
status is 200
status is 200
status is 200
status is 301
status is 200
status is 301
status is 200
status is 200
Internal error in evhttp
and then it stalled. I don't know why it comes out like that?
any help?
it seems all should go well but it's not,it makes me crazy.