Can we set tolerance level on regex annotator in Ruta?

Question

I am annotating Borrower Name "Borrower Name" -> BorrowerNameKeyword ( "label" = "Borrower Name"); But I get this text post OCR analysis. At times I might get Borrower Name as B0rr0wer Nane. Is this possible to set tolerance limit so that this text gets annotated as BorrowerNameKeyword?

Is their any other approach which could help here? I could think of dictionary correction but that wont help as it could auto correct right words.

score 1 · Answer 1 · answered Nov 19 '19 at 15:43

1

You could achieve that with regular expressions in UIMA Ruta. For you particular example the following rule should work:

"B.rr.wer\\sNa.e" -> BorrowerName;

Likewise, you can create more variants of regular expressions to cover the OCR errors.

answered Nov 19 '19 at 15:43

Viorel Morari

537
3
10

Please consider marking it as solution if it answers your question. – Viorel Morari Nov 28 '19 at 12:06

Can we set tolerance level on regex annotator in Ruta?

1 Answers1