I'm looking for any advice on how I might train a model to tell the difference between a photo of the actual object vs a photo of an image of an object. Specifically when it comes to credit cards and drivers license.
For instance, on Google pay, and similar pay apps you can add a credit card by the camera. But it turns out it can't tell if I'm taking a photo of my actual card or a photo of an image of my card on the screen. For my app, I need to do something similar but be able to tell if it's real or not.
The reason is that users sometimes try to pass off someone else's identity as their own (i.e they have a photo of someone's card) and using the app takes a photo of that photo (you can't upload a photo on the app, only take a live photo).
I'm also looking for successful implementations that already exist to study them. I couldn't really find anything, possibly because most of it is proprietary and isn't an advertised feature.
The first step I plan to take is, of course, generating the dataset, which is pretty labor intensive. This means printing out fake cards on to plastic, then taking a photo of them. The model I'm aiming for should be able to classify (1) Actual card (2) Image of the card on the screen (3) Image of the card on paper (printed/photocopy). It seems possible because most humans can tell the difference (from the glare on the screen, the texture of paper, etc). Note, the card is issued from the same entity (same logo, color, etc), so a lot of things should be constant.
Any other non-ML suggestions are welcome.
Ceci n'est pas une pipe