3

I am trying to explore the custom entity extraction using GCP AutoML, I have a 10 page documents used for training the model, where my entities are trained those were 7 & 8 page as well.

While Testing from GCP AutoML UI, I used one of the trained document only, it is not able to extract entities beyond first 5 pages. - Is this a default page limit allowed as of now? - Or is it configurable that can be change, if yes how to do it - Or can we request GCP Support to consider the complete document length?

Any pointers a appreciated.

1 Answers1

0

The limits of entity extraction don't allow for documents of more than 10000 characters, and as far as I know, this can't be modified. The Natural Language Processing API Entity Extraction feature is intended to analyse entities in short documents, so if you need to work with longer documents, I would encourage you to divide them in small batches.

I hope that helps.

Hyperion
  • 156
  • 11
  • Thanks, but the document that I have tried upto 5 pages has ~18k characters, it is restricting based on the page count of the document what I have seen.. Couldn't find anything mentioned in the Quotas about Cloud AutoML API limiting based on pdf document page. – Sheetal Lomate Feb 28 '20 at 11:03
  • Is this replicated with other documents too? – Hyperion Mar 04 '20 at 11:36