Using a BreakIterator.getWordInstance()
, the text "can't"
is considered one "word"
A little experimentation shows that while an apostrophe's within a word is considered part of the word; apostrophes at either end are considered as being separate from the word - that is a word boundary is reported between the apostrophe and the letters.
This precludes words like "'tis"
and "dogs'"
from being considered "words", even though they are spelled correctly.
Is there a way to correct this behaviour, or is this a bug?