I am planning to train a deep learning model to convert medical images from different modalities to each other (e.g. A residual U-Net to convert MRI image to CT scan image of the same region).
I have a dataset of pairs of MRIs and CT scans from the same subjects. I have tried registering the images to each other using deformable registration techniques so that anatomic landmarks are aligned together but this registration is not perfect. As a result, the position of different anatomic landmarks might be slightly different on each image pair between the two modalities. What loss function should I use in training my model? The loss function needs to be robust to distortions in the output and ground truth image. I tried using MSE but did not get favorable results.
I've tried MSE loss, I've also tried perceptual loss but the results are very noisy, blurry and not accurate.
Also, I tried using the pix2pix GAN model but while that model generates sharper images, the generated images are not actual translations of the image that I input to the model and have additional artifacts that seems like the model has hallucinated.