0

Is it possible to do accent-insensitive search with redisearch? I need the same functionality as SqlServer Collations provides.

e.g. There is a string in index Atsargų likučiai pagal sandėlius. It should be found by query string likučiai as well as likuciai.

Simple and dirty solution would be to store multiple versions of text - real and normalized, normalize all queries, search normalized versions and return real versions as result. But with millions of documents this solution would consume significant amount of memory. Is there a clean way to accomplish this?

callOfCode
  • 893
  • 8
  • 11
  • Did you consider fuzzy matching? https://oss.redislabs.com/redisearch/Query_Syntax.html#fuzzy_matching – Guy Korland Jun 03 '19 at 14:54
  • @GuyKorland Yes, but there is some words that differs in single letter but have a totally different meaning. I need exact search, only accent-insensitive. Posibility to use pipe like *liku(č|c)iai* would be invaluable. Thanks for your help! – callOfCode Jun 04 '19 at 08:15
  • https://github.com/RediSearch/RediSearch/issues/718 – Guy Korland Jun 04 '19 at 12:03

1 Answers1

3

I was able to find your document by combining both PHONETIC "dm:fr" in the schema definition and fuzzy search.

  1. Create new schema with the PHONETIC "dm:fr" option
FT.CREATE test_phonetic SCHEMA title TEXT PHONETIC "dm:fr"
  1. Add document:
FT.ADD test_phonetic doc_1 0.5 FIELDS title "Atsargų likučiai pagal sandėlius"
  1. Search using fuzzy search:
FT.SEARCH test_phonetic "@title:%likučiai%" NOCONTENT WITHSCORES
# returns doc_1 succesfully
FT.SEARCH test_phonetic "@title:%likuciai%" NOCONTENT WITHSCORES
# returns doc_1 succesfully too
Jona Rodrigues
  • 992
  • 1
  • 11
  • 23