I answered your question with the help of this essay.
This can have many reasons.
1: If you use regularization, The mathematical formula of regularization cause adding weights to loss. Consequently, training loss is much higher than validation loss. Nevertheless, the gap between training and validation loss shrinks after a few iterations. Notice that lower loss does not necessarily mean higher accuracy.
2: Training loss is calculated after each batch iteration, during each epoch, but validation loss is calculated at the end of each epoch. This may make validation loss lower than training loss. However, after many iterations, validations loss exceeds training loss. In this case, either lower loss does not mean higher accuracy.
3: Another reason; having noise in a dataset is inevitable. Sometimes our training dataset comprises more outliers compared to our validation dataset. Accordingly, the model can predict validation labels easier. In this case, the model has both lower loss and higher accuracy on validation.
More extensive explanation around your question can be found here.