The first step of the find node operation is as follows (as described in the paper):
The lookup initiator starts by picking α nodes from its closest non-empty k-bucket (or, if that bucket has fewer than α entries, it just takes the α closest nodes it knows of).
Why does it pick the elements directly from the bucket, as opposed to looking for k
closest elements across all elements in all buckets? I believe the latter is what happens in step 2 of the algorithm, and can be seen in the visualization here.