2

I'm trying to understand what is the best way to train a custom model for invoices in languages not supported by the prebuilt invoice model, french as an example.

As normal we will have many different invoice layouts from different vendors, but in all of them, we will extract the same set of labels (invoice number, amount, date, vendor name, etc).

Should I create a model per vendor and compose it? If I do so, do I need to train it for all vendors, or will it work for invoices that were not trained, but use the same verbiage as trained invoices?

2 Answers2

3

If you are trying to get just a few fields like the invoice number, amount, date etc. You can try the Invoice pre-built and see if it extracts the data you need. It is not yet trained on French or other languages invoices and should have lower quality but might work. If you are training custom models you will need to train a model per provider and then compose all the single models into a model composed. I would recommend to take your top providers and create models for them.

Neta
  • 690
  • 4
  • 5
  • 1
    Thank you for your input. I have tried the built-in invoice model with other languages, but it barely recognize any information properly, and information like amount and date run into its own set of problems because of the format, like comma for decimals vs comma for thousand separators, dmy vs mdy on dates is another problem. As far as composing, another challenge I have is that you can only compose up to a 100 models into a single composed one. Plus, if I compose, it looks like I would have to train every single layout, which is something I'd like to avoid for obvious reasons. – Hugo Scaramal Apr 26 '21 at 19:00
1

I got an answer from Microsoft on MS QA site, see below:
"For invoices (I believe he meant English invoices) you should use the pre-built Invoice model, no training required - https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/concept-invoices.
If you need to train a model and not use the pre-built than yes a model per vendor\provider and compose them. Start with the top providers so that you get more coverage."

Find more information on the MS QA Question.

  • I got stuck with an authentication concept for the Form Recognizer. https://stackoverflow.com/questions/68088340/api-authentication-for-azure-form-recognizer .. Can you please guide me through? – Sachindra Jun 22 '21 at 18:29