A common initializer for the sigmoid-based networks is Xavier initializer (a.k.a. Glorot initializer), named after Xavier Glorot, one of the authors of "Understanding the difficulty of training deep feedforward neural networks" paper. The formula takes into account not only the number of incoming connections, but also outcoming as well. The authors prove that with this initialization, activations distribution is approximately normal, which helps gradient flow in the backward pass.
For relu-based networks, a better initializer is He initializer from "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification" by Kaiming He at al., which prove the same properties for relu activation.
Dense and convolutional layer aren't that different in this case, but it's important to remember that kernel weights are shared across the input image and batch, so the number of incoming connections depends on several parameters, incluing kernel size and striding, and might not be easy to calculate by hand.
In tensorflow, He initialization is implemented in variance_scaling_initializer()
function (which is, in fact, a more general initializer, but by default performs He initialization), while Xavier initializer is logically xavier_initializer()
.
See also this discussion on CrossValidated.