2

I'm creating a custom spell check engine implementation using an open-source hunspell dic/aff set as a starting point. After an inordinate amount of hacking, googling, etc., I copied a flag set that appears to result in case-insensitive searching (e.g., the spell check passes "Word" as well as "word", when only "word" is present in the dic file). Problem is, I have no idea WHY this works, and I can't find anything online or in the files indicating how case is treated. The syntax in my dic that works is:

word/1   1

Without these flags, case handling is strict.

I am remiss to implement a "solution" I can't explain. Any one have any idea how to specify case handling in a dic/aff set so that I can figure out what's actually happening?

wolfmason
  • 399
  • 1
  • 13

1 Answers1

3

By default, entries in .dic files are presumed to be case-insensitive, and thus given the entry word, all of the following should be returned as correctly spelled: word, Word, WORD.

If you want to restrict words to a single casing, you'll need define a case sensitive flag in your .aff file:

KEEPCASE X

Where X is a one or two letter flag or a number (based on your FLAG setting)

So, if I only wanted to allow John (but not john or JOHN), I could then put in my .dicfile the following entry:

John/X

I do not know why the /1 is allowing for case-insensitive results, especially because that line is a bad format (the only thing that should follow after spaces is a field id and its information, for example, po:noun)

You said you're creating a custom engine, so the problem is going to be in your engine, not in the .dic/.aff files.

user0721090601
  • 5,276
  • 24
  • 41