I am crawling for example 1000 websites.when I readdb for some websites it is showing db_redirect_temp and db_redirect_moved if I set http.redirect.max=10 is this value for each website or it treat only 10 redirects for entire crawling websites.
Asked
Active
Viewed 70 times
1 Answers
1
http.redirect.max is defined as:
The maximum number of redirects the fetcher will follow when trying to fetch a page. If set to negative or 0, fetcher won't immediately follow redirected URLs, instead it will record them for later fetching.
The number applies to the redirects of a single web page. 10 is a really generous limit, 3 should be enough in most cases given that the redirect target will be tried in one of the later fetch cycles anyway. Note that the redirect source is always recorded in the CrawlDb as db_redir_perm or db_redir_temp.

Sebastian Nagel
- 2,049
- 10
- 10
-
so it means that every webpage will follow redirect 10 in crawling cycle or it will follow crawl in next cycle @Sebastian Nagel – Ravi Kiran Oct 19 '20 at 05:21
-
Yes. Up to 10 redirects in a chain are followed directly by fetcher if `http.redirect.max` is set to 10. – Sebastian Nagel Oct 19 '20 at 09:07
-
so it follow redirect in the same crawling cycle or in the next crawling cycle. – Ravi Kiran Oct 19 '20 at 10:37
-
Yes, the redirects are followed in the same cycle. – Sebastian Nagel Oct 19 '20 at 14:57