I am working on reproducing the results reported in this paper. A UNET based network is used for estimating sound speed map from raw Ultrasound channel data. I have been stuck in further reducing the train/val loss for a long time.
Basically, I followed their methods of data simulation, preprocessing and used the same network architecture, hyperparameters (include kernel initializer, batch size, decay rate, etc.). The input size is 128*1024 rather than 192*2048 according to my ultrasound probe (based on their recent paper, the input size won't affect the performance).
So my question is do you have any suggestion to further investigate this problem based on your experience?
I attached my results (loss curves and images). RMSE loss estimated sound speed And their results RMSE loss estimated sound speed It seems my network failed to have a comparative convergence at the background, that could explain that I got a larger initial loss.
PS: Unfortunately, the paper didn't provide codes, so I have no clue to some details in terms of data simulation and training. I have contacted the author but hasn't got response yet.
The author mentioned somewhere instead of using a pixel-wise MSE, try a larger window size 3*3 or 5*5, I am not clear whether it is used for training or metric evaluation, any reference for the former?