There was a question about this, but the user was satisfied (probably?) with knowing about precision, recall and F1 score, so I'll extend it:
To compute precision & recall, you need the TP, FN, TN and FP values. Out of the box, after a crawl, you know:
- TP + FP (those were selected as relevant)
- TN + FN (the rest which were crawled and discarded)
The hard part seems to be separating those sums by finding the truly relevant pages out of the crawled set (TP and FN - not added up)
Verifying a document's relevancy, I can do that manually, aside from the crawler's relevancy function which should actually be tested. In my case it is the cosine similarity between the TF-IDFs of the crawled page and a user-given on-topic document.
As I want test it on more than a couple hundred crawled pages, how do you make the correctness evaluation using precision and recall, without actually manually verifying every crawled page? Also, is there any other way to evaluate a focused web crawler?