I'm reading about AlphaGo Zero's network structure and came across this cheatsheet:
I'm having a hard time understanding how skip connections work dimensionally.
Specifically, it seems like each residual layer ends up with 2 stacked copies of the input it receives. Would this not cause the input size to grow exponentially with the depth of the network?
And could this be avoided by changing the output channel size of the conv2d filter? I see that in_C and out_C don't have to be the same in pytorch, but I don't know enough to understand the implications of these values being different.