0

I customized a Custom Document Extractor from purchase_order_parser. I trianed more than 50 documents. The test score was 51,6% and F1 0,193. So I enabled HITL and understood Document AI got rid of most of my train labelling and also found I can't label my document with custom field because HITL doesn't show them. Every help/tip is appreciated.

  • Can you provide some more information about this? Did you create a Custom Document Extractor or did you uptrain a Purchase Order Processor? How did you set up your Training Data in Google Cloud Storage and how did you configure your Human in the Loop storage bucket? Can you clarify what you mean by Document AI getting rid of training labeling, and HITL not showing the custom fields? – Holt Skinner Aug 02 '23 at 15:39
  • Thanks for replying. What I did is: set up a bucket, I set up a processo based on a google processo called purchase_order_parser. I just added to this processor a few labels (which I can't use in HITL). So I trained the processor with more than 50 pdfs. I tried to do it carfully. Screenshot https://prnt.sc/R6FTtT9LlcHj – UFFICIO AGROITTICA Aug 04 '23 at 10:01
  • "how did you configure your Human in the Loop storage bucket": https://prnt.sc/D5QKwGO8dZb5 "what you mean by Document AI getting rid of training labeling, and HITL not showing the custom fields" This is the HITL output: https://prnt.sc/HGjlHL4VtBvF Many thanks – UFFICIO AGROITTICA Aug 04 '23 at 10:09
  • Can you clarify if you created a Custom Document Extractor or uptrained a Purchase Order Processor? Also, did you use the same bucket for setting up your Processor dataset and HITL? They need to be separate buckets to prevent them overwriting. – Holt Skinner Aug 08 '23 at 14:50
  • Hi Holt. Yes. I uptrained the Purchase Order Processor. And unfortunately I used the same bucket for setting up the Processor dataset and HITL. This process is stuck. So what I did is: 1. to choose just 4 types of client orders (documents) 2. I set up a bucket and called it bucket_puchase_order_parser_training 3. imported the 53 order pdfs (38 related to one client, 4, 9 and 2 to other 3 clients 4. then click on processor gallery and select order_parser_porcessor and còicked create processor – UFFICIO AGROITTICA Aug 09 '23 at 15:23
  • 5. then I went in the processor training and import all the uploaded documents Import Documents > Choose > select bucket_puchase_order_parser_training > select training in the data share > no automatic labelling 6. I opened all the training documents one by one and set all the labels. I didn't modified schema adding new labels: I just use the standard bucket_puchase_order_parser labels 7. than I didn't clicked the do the uptraining of a new version; I instead clicked on the Create a labelling activity button. – UFFICIO AGROITTICA Aug 09 '23 at 15:24
  • I selected training Documents (53) and select the pool of expert I created before (just one expert). Since I'm the only expert, I didn't setup and choose any instructions document. An error message appeared ayin "impossible to start a labelling activity while another activity is in progress". So I stopped the labelling activity clicking on the three dots on the right. 8. so I clicked on the do the uptraining of a new version button but an error message appeare saying data set doesn't meet minimum training criteria https://prnt.sc/umDNmY_6PqXK – UFFICIO AGROITTICA Aug 09 '23 at 15:24
  • 9. so I click on view labels statistics and found every label is in 50 training documents and in 50 test documents. I guess there is something I am missing https://prnt.sc/xXRuqg7KP1SV Actually I'm stuck again. I'd be grateful if you could help. Many thanks. – UFFICIO AGROITTICA Aug 09 '23 at 15:24
  • General question, if you're the only one labeling this data, is there a particular reason you used HITL instead of the Document AI Workbench dataset console to label this data. It is much better suited for labeling additional training samples. And you can utilize auto-labeling to speed up the process since you already have a trained version. https://cloud.google.com/document-ai/docs/workbench/label-documents – Holt Skinner Aug 09 '23 at 17:28

0 Answers0