1

I have a graph database (Neo4j) in which I configured a property to be auto indexed with full-text. Everything is working great except that I have 1 row that is not returned when I execute a particular cypher query.

My property in the graph equals (I've put in bold the words I am using in my cypher query):

1pizzeriadeicomparipourlesamateursdevraiespizzasitaliennescestadireavecpastropdepateetcuitesaufeudeboislaplacenepayepasdeminesalleettablesassezpetitesetilfautsarmerdepatiencelessamedisoirssionnapasreserveenv15minutesdattentemaislespizzassontexcellentesrestaurantmontrealmontrealquebeccanada5148435411

If I execute the following cypher query:

START n1=NODE:node_auto_index('Search_Field:*res* AND Search_Field:*taurant* AND Search_Field:*411*')
RETURN n1.Search_Field

My row is returned! So far no problem!

But when I execute it by putting the word « restaurant » all together like this:

START n1=NODE:node_auto_index('Search_Field:*restaurant* AND Search_Field:*411*')    
RETURN n1.Search_Field

Then no rows are returned.

I tested a lot of stuffs in order to understand and try to find a pattern or something that can explain the problem. It seems like the length of my property value might play a role. I know it sounds strange but if I add 3 or more letters, let say « aaa », after the word restaurant in the property value, like this (look at the bold letters close to the end of the value):

1pizzeriadeicomparipourlesamateursdevraiespizzasitaliennescestadireavecpastropdepateetcuitesaufeudeboislaplacenepayepasdeminesalleettablesassezpetitesetilfautsarmerdepatiencelessamedisoirssionnapasreserveenv15minutesdattentemaislespizzassontexcellentesrestaurantaaamontrealmontrealquebeccanada5148435411

then, if I execute the same cypher query, the row is now returned.

Anyone had encountered similar problems! It's driving me crazy!

I have tested on both Neo4j-enterprise 2.2.1 and the latest Community 3.0.0-M02. Same result with both of them.

Any idea on where or what should I look for ?

1 Answers1

0

The query term get passed through the lucene analyzer - just like the contents you index. I'm not 100% sure but I think that the default analyzer "eats up" the digits, that's why you don't get the results.

You can supply an analyzer class when the index is created for the first time. Also you can use Java API to query the index - this allows to pass in instances of Lucene Query, see my example at http://blog.armbruster-it.de/2014/10/deep-dive-on-fulltext-indexing-with-neo4j/.

Stefan Armbruster
  • 39,465
  • 6
  • 87
  • 97
  • I will look at the different analyzer but for I am not sure it's about the digits either because I have about 5000 entries in my graph, all with digits and when I query them, results are fine except for the specific row I described above and if I change the value by just adding 3 more letters then the row is found !! – Martin Larivière Jan 29 '16 at 13:56
  • Also, what if people are entering text bilingually (ie.: french and english), do you know any Custom Analyzer taking multiple languages ? – Martin Larivière Jan 29 '16 at 14:00
  • couple of people use a naming convention on their property keys, e.g. `name_en`, `name_fr`. In this case you cannot use auto indexes, but you can use manual indexes - one index per language. – Stefan Armbruster Jan 29 '16 at 14:34
  • the actual indexing can be done selectively in a TransactionEventHandler. – Stefan Armbruster Jan 29 '16 at 14:35
  • Thanks Stefan. Just to let you know, I changed my approach, when I build my Search_Field I now insert a space between each words, and now the query is working fine. Maybe a too long string was hard on the analyzer. – Martin Larivière Jan 29 '16 at 15:45