If this tree is stored in an array, it can be rearranged to be "level continuous" (the first element is the root, the next two its children, the next four their children, etc), and the problem is trivial.
If it is stored in another way, it becomes problematic. You could try a breadth-first traversal, but that can consume O(n) memory.
Well I guess you can create an O(n log n) time algorithm by storing the current level and the path to the current element (represented as a binary number), and only storing the last visited element to be able to create pairs. Only O(1 + log n) memory. -> This might actually be an O(n) algorithm with backtracking (see below).
I know there is an easy algorithm that visits all nodes in order in O(n), so there might be a trick to visit "sister" nodes in order in O(n) time. O(n log n) time is guaranteed tho, you could just stop at a given level.
There is a trivial O(n log n) algorithm as well, you just have to filter the nodes for a given level, increasing the level for the next loop.
Edit:
Okay, I created a solution that runs in O(n) with O(1) memory (two machine word sized variables to keep track of the current and maximal level /which is technically O(log log n) memory, but let's gloss over that/ and a few Nodes), but it requires that all Nodes contain a pointer to their parent. With this special structure it is possible to do an inorder traversal without an O(n log n) stack, only using two Nodes for stepping either left, up or right. You stop at a particular maximum level and never go below it. You keep track of the actual and the maximum level, and the last Node you encountered on the maximum level. Obviously you can print out such pairs if you encounter the next at the max level. You increase the maximum level each time you realize there is no more on that level.
Starting from the root Node in an n-1 node binary tree, this amounts to 1 + 3 + 7 + 15 + ... + n - 1 operations. This is 2 + 4 + 8 + 16 + ... + n - log2n = 2n - log2n = O(n) operations.
Without the special Node* parent
members, this algorithm is only possible with O(log n) memory due to the stack needed for the in-order traversal.