Can we pass table column info to help FormParser determine header_row contents?

Question

Suppose I have a pdf file containing the following table info

Trainer: Giannis

Pokedex: Incomplete

Name	Type	Weight	Height	Color
Pikachu	Electric	6.0 kg	0.4 m	Yellow
Bulbasaur	Grass/Poison	6.9 kg	0.7 m	Green
Charizard	Fire/Flying	90.5 kg	1.7 m	Orange
Jigglypuff	Normal/Fairy	5.5 kg	0.5 m	Pink
Gyarados	Water/Flying	235.0 kg	6.5 m	Blue

I am using the Form Parser to extract the table information.

If I know that the table columns will always be [Name, Type, ... , Color] is there a way to pass this info to the FormParser processor to help it better determine the header rows?

Thank u in advance for your time!

score 1 · Accepted Answer · answered Jun 27 '23 at 16:50

You can't add any "hints" for the Form Parser to adjust the model at this time. You can try using a different version of the Form Parser model to see if the results are more like what you would expect.

To extract values from a document using a custom defined schema like you are suggesting, you will likely get the best results using a Custom Document Extractor. You can follow this guide for instructions on how to build a custom processor, and this section about Quick Tables in the labeling documentation could be useful to speed up labeling for tabular data.

Thank you very much Holt! Also your video guides about documentai are amazing ! — inpap, Jun 28 '23 at 08:23

Can we pass table column info to help FormParser determine header_row contents?

1 Answers1