I have seen in focused web crawling (a.k.a. topical web crawling), the evaluation metric - harvest ratio - is defined as: after crawling 't' pages, harvest ratio = number_of_relevant_pages/pages_crawled(t).
So for example after crawling 100 pages I get 80 true positives then the harvest ratio of the crawler at that point is 0.9. But the crawler might have ignored some pages off crawling that are totally relevant to the crawling domain but is not accounted in the evaluation ratio. What is this? Can we improve that evaluation metric to include the missed pages that are totally relevant? Is this consideration important?