0

Say I have a job as financial administrator (j:Job {name: 'financial administrator'}).

Many people use different titles for a 'financial administrator'. Therefore, I want abovementioned job as a hit, even if people type only 'financial' or 'administrator' and their input has typos (like: 'fynancial').

CONTAINS only gives results when the match is 100% - so without typos.

Thanks a lot!

Rob Brand
  • 27
  • 4

2 Answers2

0

First, you could try fuzzy matching with a full text index and see if it solves the issue. An example would be: Set up the index- CALL db.index.fulltext.createNodeIndex('jobs', ['Job'], ['name'], {})

Query the index with fuzzy matching (note the ~)

CALL db.index.fulltext.queryNodes('jobs', 'fynancial~')

If you want to go further and use Lucene's phonetic searches, then you could write a little Java code to register a custom analyzer.

Include the lucene-analyzers-phonetic dependency like so:

     <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-phonetic</artifactId>
            <version>8.5.1</version>
        </dependency>

Then create a custom analyzer:

@ServiceProvider
public class PhoneticAnalyzer extends AnalyzerProvider {


    public PhoneticAnalyzer() {
        super("phonetic");
    }

    @Override
    public Analyzer createAnalyzer() {
        return new Analyzer() {
            @Override
            protected TokenStreamComponents createComponents(String s) {
                Tokenizer tokenizer = new StandardTokenizer();
                TokenStream stream = new DoubleMetaphoneFilter(tokenizer, 6, true);
                return new TokenStreamComponents(tokenizer, stream);
            }
        };
    }
}

I used the DoubleMetaphoneFilter but you can experiment with others. Package it as a jar, and put it into Neo4j's plugin directory along with the Lucene phonetic jar and restart the server. Then, create a full text index using this analyzer:

CALL db.index.fulltext.createNodeIndex('jobs', ['Job'], ['name'], {analyzer:'phonetic'})

Querying the index looks the same:

CALL db.index.fulltext.queryNodes('jobs', 'fynancial')

Luanne
  • 19,145
  • 1
  • 39
  • 51
0

It took a while, this is how I solved my question.

MATCH (a)-[:IS]->(hs)
UNWIND a.naam AS namelist
CALL apoc.text.phonetic(namelist) YIELD value
WITH value AS search_str, SPLIT('INPUT FROM DATABASE', ' ') AS input, a
CALL apoc.text.phonetic(input) YIELD value
WITH value AS match_str, search_str, a
WHERE search_str CONTAINS match_str OR search_str = match_str
RETURN DISTINCT a.naam, label(a)
Rob Brand
  • 27
  • 4