I'm not able to respond directly to the quoted text, because I don't have the book it comes from, nor do I know the authors intent.
However, an independent program counter per thread is considered to be a new feature in Volta, see figure 21 and caption in the volta whitepaper:
Volta maintains per-thread scheduling resources such as program counter (PC) and call stack (S), while earlier architectures maintained these resources per warp.
The same whitepaper probably does about as good a job as you will find of why this is needed in Volta, and presumably it carries forward to newer architectures such as Turing:
Volta’s independent thread scheduling allows the GPU to yield execution of any thread, either to
make better use of execution resources or to allow one thread to wait for data to be produced by
another. To maximize parallel efficiency, Volta includes a schedule optimizer which determines
how to group active threads from the same warp together into SIMT units. This retains the high
throughput of SIMT execution as in prior NVIDIA GPUs, but with much more flexibility: threads
can now diverge and reconverge at sub-warp granularity, while the convergence optimizer in
Volta will still group together threads which are executing the same code and run them in parallel
for maximum efficiency
Because of this, a Volta warp could have any number of subgroups of threads (up to the warp size, 32), which could be at different places in the instruction stream. The Volta designers decided that the best way to support this flexibility was to provide (among other things) a separate PC per thread in the warp.