There is a confirmation from perhaps Google engineer here that MLKit cannot determine whether two detected faces are of the same person. However, according to Face detection document:
Track faces across video frames Get an identifier for each unique detected face. The identifier is consistent across invocations, so you can perform image manipulation on a particular person in a video stream.
I am wonder why it doesn't work on a list of photos as video is just combination of photo frames. It seems currently only MLKit is the on-device library for face detection that works without API, it would be great if MLKit supports face recognizer as well.