VGG-16 with ImageNet weights might not be suitable for a colorisation problem. VGG is a neural network developed/trained for Image classification problems. You can use ImageNet weights that'll help in transfer learning for custom image classes like Cats vs Dogs or Cars vs Bikes (Commonly found classes).
Auto-encoders are a different class of neural networks altogether that focus on input -> output mappings. An auto-encoder is perfectly suited for a problem like this, you input one type of image and get a modified version of the same image, be it colorisation or denonising to name a few uses of Auto-encoders.
As an alternative, you can use U-Net in combination with GANs to tackle this problem statement. More details here