1

I would like to use Levenshtein and Im looking for some examples. I already read the documentation, but I dont know how to implement it. I tried to build my own Analyzer, but it crashed everytime I used it.

Here is the documentation I follwed: https://crate.io/docs/reference/sql/fulltext.html

Example table:

CREATE TABLE IF NOT EXISTS "test"."accounts" (
   "customer_name" STRING INDEX USING FULLTEXT WITH (
      analyzer = 'standard'
   ),
"customer_no" STRING,
   PRIMARY KEY ("customer_no")
)


INSERT INTO "test"."accounts" 
(customer_name, customer_no)
VALUES('Walmart','C00001');

My goal will be to search for Wal-mart and return Walmart.

Wk Ali
  • 21
  • 1
  • 4

1 Answers1

3

The standard analyzer you using for this example would split the search word ‘wal-mart’ (because of the hyphen) into two tokens - ‘wal’ and ‘mart’. Since this is probably not what you want for the described use case, i would recommend to add a custom analyzer such as:

create ANALYZER lowercase_keyword (
    TOKENIZER keyword,
    TOKEN_FILTERS (
        lowercase
    )
);

This will index the word as it is - except turning it into lowercase.

Then create a table with the new created analyzer and add some data:

CREATE TABLE IF NOT EXISTS "test"."accounts" (
   "customer_name" STRING INDEX USING FULLTEXT WITH (
      analyzer = 'lowercase_keyword'
   ),
"customer_no" STRING,
   PRIMARY KEY ("customer_no")
);

INSERT INTO "test"."accounts" (customer_name, customer_no) VALUES ('Walmart', 'C00001'), ('Wal-mart', 'C00002'), ('wal-mart', 'C00003'), ('wal- mart’, ’C00004');

Now the query given below returns ‘Walmart’, ‘Wal-mart’ and ‘wal-mart’:

select customer_name from test.accounts where match(customer_name, 'walmart') using best_fields with (fuzziness=1);

With a fuzziness of 2 the query would additionaly return ‘wal- mart’.

christian
  • 56
  • 3