I'm trying to build an application in Rails that will help users read Chinese text. If a user clicks on a Chinese character, they'd get information about the pronunciation and meaning.
I got this to work using a database of a Chinese-English dictionary. However, I'm not sure how to detect whether a character is just a single character or a part of a longer word. For example: I have the text 我是铁公鸡
and the user clicks on the word 公
, which means "public" but the app should show highlight 铁公鸡
as "miser". So the character can be a standalone thing or form words with the other characters around.
What's an efficient way to detect what word the character forms? I was thinking of checking the target character and its neighbors against the database and choosing the longest combination that can be found. Any other ideas?