0

Suppose I have a pdf file containing the following table info

Trainer: Giannis

Pokedex: Incomplete

Name Type Weight Height Color
Pikachu Electric 6.0 kg 0.4 m Yellow
Bulbasaur Grass/Poison 6.9 kg 0.7 m Green
Charizard Fire/Flying 90.5 kg 1.7 m Orange
Jigglypuff Normal/Fairy 5.5 kg 0.5 m Pink
Gyarados Water/Flying 235.0 kg 6.5 m Blue

I am using the Form Parser to extract the table information.

If I know that the table columns will always be [Name, Type, ... , Color] is there a way to pass this info to the FormParser processor to help it better determine the header rows?

Thank u in advance for your time!

inpap
  • 365
  • 3
  • 12

1 Answers1

1

You can't add any "hints" for the Form Parser to adjust the model at this time. You can try using a different version of the Form Parser model to see if the results are more like what you would expect.

To extract values from a document using a custom defined schema like you are suggesting, you will likely get the best results using a Custom Document Extractor. You can follow this guide for instructions on how to build a custom processor, and this section about Quick Tables in the labeling documentation could be useful to speed up labeling for tabular data.

Holt Skinner
  • 1,692
  • 1
  • 8
  • 21