Why adding non-relevant documents improve system performance? And how to evaluate the new result?

Question

Suppose an IR system returns a ranked list of 20 documents in response to a query from a collection of 10,000 documents. If 5,000 non-relevant documents are added to the collection, we find that the same ranked list is returned for the query. That means the new setting, i.e., changing collection size to 15,000, does not change recall and precision on the 20 results. However, it seems that the system performs better in the new setting because more non-relevant documents need to be dealt with.

score 0 · Answer 1 · edited Jun 25 '22 at 08:44

I'll answer this question based on my thinking:

<table border="1">
<tr>
  <td> </td>
  <td>relevant</td>
  <td>nonrelevant</td>
  <td> </td>
</tr>
<tr>
  <td>retrieved</td>
  <td>tp</td>
  <td>fp</td>
  <td>fix</td>
</tr>
<tr>
  <td>not retrieved</td>
  <td>fn</td>
  <td>tn</td>
</tr>
<tr>
  <td></td>
  <td></td>
  <td>increase tn</td>
</tr>

</table>

Adding non-relevant documents is equivalent to increasing tn, thus the new measure could be fn/(fn+tn)

Why adding non-relevant documents improve system performance? And how to evaluate the new result?

1 Answers1