Unfortunately, because of confidential data, I can't give a more specific explanation.
The Problem
So I've got a few documents that in general contain the same information but have different formats. In most cases, the value I am looking for is near a keyword on the document. The OCR itself is taken care of by the Google Cloud Vision API but what is the best approach to handle the different formats?
My idea
... was to train a classifier that detects what format I am dealing with and then picks the appropriate way of finding the target value, I implemented beforehand by hand. This is not handy nor scalable. So I am looking for some algorithm I tell e.g. where the target value is, what it looks like etc.
What is the best ML-approach for this problem or what are your ideas?
As an example of the type of data: Let's say I have receipts from 20 different supermarkets and I am looking to find the total cost, with the problem that every companies receipt looks different.