Short answer: It's the list ordered by the predictions from the model at the current iteration.
Let's see why it makes sense.
At each training iteration, we perform the following steps (these steps are standard for all Machine Learning algorithms, whether it's classification or regression or ranking tasks):
- Calculate scores s[i] = f(x[i]) returned by our model for each document i.
- Calculate the gradients of model's weights ∂C/∂w, back-propagated from RankNet's cost C. This gradient is the sum of all pairwise gradients ∂C[i, j]/∂w, calculated for each document's pair (i, j).
- Perform a gradient ascent step (i.e. w := w + u * ∂C/∂w where u is step size).
In "Speeding up RankNet" paragraph, the notion λ[i] was introduced as contributions of each document's computed scores (using the model weights at current iteration) to the overall gradient ∂C/∂w (at current iteration). If we order our list of documents by the scores from the model at current iteration, each λ[i] can be thought of as "arrows" attached to each document, the sign of which tells us to which direction, up or down, that document should be moved to increase NDCG. Again, NCDG is computed from the order, predicted by our model.
Now, the problem is that the lambdas λ[i, j] for the pair (i, j) contributes equally to the overall gradient. That means the rankings of documents below, let’s say, 100th position is given equal improtance to the rankings of the top documents. This is not what we want: we should prioritize having relevant documents at the very top much more than having correct ranking below 100th position.
That's why we multiply each of those "arrows" by |∆NDCG| to emphasise on top ranking more than the ranking at the bottom of our list.