From my understanding, even with a tail call, new frames are still pushed onto the stack, and are thus traceable
You misunderstand.
From Wikipedia:
Tail calls can be implemented without adding a new stack frame to the call stack. [emphasis mine] Most of the frame of the current procedure is no longer needed, and can be replaced by the frame of the tail call, modified as appropriate (similar to overlay for processes, but for function calls). The program can then jump to the called subroutine. Producing such code instead of a standard call sequence is called tail call elimination.
As "sibling calls" are just a special case of tail calls, they can be optimized in the same way. You should be able to see examples of this in any scenario where the compiler would optimize other tail calls, as well as in those specific examples such as described in the above-referenced Wikipedia article.