From the documentation on FT.CREATE
:
LANGUAGE {default_lang}
if set, indicates the default language for documents in the index. Default is English.
LANGUAGE_FIELD {lang_attribute}
is document attribute set as the document language.
A stemmer is used for the supplied language during indexing. If an unsupported language is sent, the command returns an error. The supported languages are Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil, Turkish, and Chinese.
When adding Chinese language documents, set LANGUAGE chinese for the indexer to properly tokenize the terms. If you use the default language, then search terms are extracted based on punctuation characters and whitespace. The Chinese language tokenizer makes use of a segmentation algorithm (via Friso), which segments text and checks it against a predefined dictionary. See Stemming for more information.
You have to define your index as Chinese index in order to make RediSearch use the right indexing methods.
Using redis-py
, the language attributes should be passed to IndexDefinition
attributes:
definition = IndexDefinition(language="Chinese", index_type=IndexType.JSON)
client.ft().create_index(
(TextField("$.name", as_name="name"), NumericField("$.num", as_name="num")),
definition=definition,
)
Notice that you also have to specify the language on the query itself. From FT.SEARCH
documentation:
LANGUAGE {language}
use a stemmer for the supplied language during search for query expansion. If querying documents in Chinese, set to chinese to properly tokenize the query terms. Defaults to English. If an unsupported language is sent, the command returns an error. See FT.CREATE for the list of languages.
And again, using redispy
, the query should look something like this:
q = Query("$hello").language("Chinese")
res = client.ft().search(q)