I have built a multi threaded web crawler which makes requests to fetch the web pages from corresponding servers. As it is multi threaded it can make overburden a server. Due to which server can block the crawler(politeness).
I just want to add functionality of minimum delay between consequtive request to same server. Whether storing minimum delay from robot.txt from each server(domain) into a HashMap and comparing it to last timing of request made to that particular server will be all right?
What if no delay is specified in robot.txt ?