How to change a Hunspell affix file to allow numbers in words?

Question

OCR programs often mistakenly recognize the capital letter O as a zero or vice versa. For example, they might recognize Over as 0ver or well as we11.

I tried to add

REP 0 O
REP 1 l

to the affix file, but it didn't work because numbers are apparently considered word boundaries.

(I had a look at the hunspell man page, but I can't figure out which of the numerous settings needs to be changed to allow numbers in words.)

score 1 · Accepted Answer · answered Jul 20 '16 at 11:09

1

From the manpages:

REP what replacement This table specifies modifications to try first. First REP is the header of this table and one or more REP data line are following it. With this table, Hunspell can suggest the right forms for the typical spelling mistakes when the incorrect form differs by more than 1 letter from the right form. The search string supports the regex boundary signs (^ and $). For example a possible English replacement table definition to handle misspelled consonants:

          REP 5
          REP f ph
          REP ph f
          REP tion$ shun
          REP ^cooccurr co-occurr
          REP ^alot$ a_lot

Did you add the first line, REP + number of replacements?

answered Jul 20 '16 at 11:09

Max Uppenkamp

974
4
16

Thanks for your answer. I did add the header with the number of following entries and all other REP statements work. What doesn't work is: `REP 0 O` – Nemo XXX Jul 20 '16 at 14:18
Do you think it might be possible, that hunspell sees REP followed by an integer, and interprets it as a header, instead of a replacement? In that case, placing the zero replacement at the end might work, if it doesn't i'm afraid that's an oversight of the Hunspell implementation. – Max Uppenkamp Jul 20 '16 at 14:47
You're probably right. The Hunspell parser probably gets confused by numbers in REP statements. – Nemo XXX Jul 21 '16 at 09:55

How to change a Hunspell affix file to allow numbers in words?

1 Answers1