-1

My organization's mission is to synthesise data from pds of research papers in social science. We have our own taxonomies of defined terms which we extract from each paper (e.g. World Bank Sector: Health, Education and so on). The trial version of DocumentAI does not seem to allow automating data extraction based on controlled vocabularies. Is there a paid version which would allow that? If not, we may need to look for other products that would suit our needs.

In the Edit schema section: I created the labels but there is no way for me to add controlled vocabulary.

1 Answers1

0

Document AI doesn't currently have a feature about extracting controlled vocabularies or custom word lists.

What you can try doing is using the Document OCR processor to extract all of the text from the PDF documents, then you can input that text into a custom natural language model (using something like Vertex AI AutoML Text ) that can handle your custom vocabulary use cases.

Holt Skinner
  • 1,692
  • 1
  • 8
  • 21