2

It is very easy to do case insensitive Cypher queries. I am now trying to figure out if there is an easy way to make Accent Insensitive queries. I am thinking of a query similar to :

MATCH n:City WHERE n.Name =~ '(?a)Montreal' RETURN n

Is someone has found a solution to this? Do I have to rely on creating FullText Lucene Index along with a Custom Analyzer?

2 Answers2

1

Schema indexes in Neo4j 2.0 currently do not allow to configure analyzers. This might be added in a subsequent version of Neo4j. In the meantime you can either go with legacy indexes (those allow you to customize analyzers) or normalize the strings on application side.

Stefan Armbruster
  • 39,465
  • 6
  • 87
  • 97
  • Thanks. When you say normalize the strings on the application side do you mean that everything in the graph should be Accent-free or that we should keep two strings in the graph for every string, one being the display string and the other the searchable accent-free string? – Martin Larivière Jan 13 '14 at 17:16
  • I'll guess storing two versions of the string in different properties, e.g. name, normalized_name might be the best approach. You can even automate that by implementing and registering a http://api.neo4j.org/2.0.0/org/neo4j/graphdb/event/TransactionEventHandler.html. – Stefan Armbruster Jan 13 '14 at 17:21
  • That's what I tough. I am trying to evaluate Neo4j versus a common MySQL database. In MySQL all of this is kind of automated. I will try some stuffs (as the one you are suggesting) along with the legacy index with custom analyzer and see how it goes. Thanks. – Martin Larivière Jan 13 '14 at 17:36
1

Lucene Fulltext index is automatically case insensitive. So no custom analyzer needed.

Also another option is to store the lowercase version of your information in the graph as well and use that for lookups or search. I know it's a weak workaround.

Michael Hunger
  • 41,339
  • 3
  • 57
  • 80
  • Thanks. When I say use a `Custom analyzer` it was because I also need to perform `Accent Insensitive` research. In French, we have a lot of accents (éèçàî and so on) that needs to be converted in (eecai...). – Martin Larivière Jan 14 '14 at 15:18
  • I was thinking about using duplicate fields for every field that I need to search in but I found that Neo4J is very slow when I try to search in a large number of nodes in comparison with what I can do with MySQL. I want to test it with Lucene Index to see if it improves the performance! – Martin Larivière Jan 14 '14 at 15:23