Cypher query return undesirable result

Question

I need to get texts and save them to Neo4j. After that, I separate each word of that text and create a [:NEXT] relationship between them indicating the word that comes after another one and a [:CONTAINS] relationship indicating that the text contains that word. Finally I try to get the word in the text that has more relations [:NEXT] but not in the whole database. Only in the given text.

Unfortunatelly I just get the sum of the whole database.

The query is:

query = '''
        WITH split("%s"," ") as words 
        MERGE (p:Post {id: '%s', text: '%s'})
        WITH p, words
        UNWIND range(0,size(words)-2) as idx
        MERGE (w1:Word {name:words[idx]})
        MERGE (w2:Word {name:words[idx+1]})
        MERGE (w1)-[:NEXT]->(w2)
        MERGE (p)-[:CONTAINS]->(w2)
        MERGE (p)-[:CONTAINS]->(w1)
        WITH p
        MATCH (p)-[c:CONTAINS]->(w:Word)
        MATCH ()-[n1:NEXT]->(:Word {name: w.name})<-[:CONTAINS]-(p)
        MATCH (p)-[:CONTAINS]-(:Word {name: w.name})-[n2:NEXT]->()
        WITH COUNT(n1) + COUNT(n2)AS score, w.name AS word, p.text AS post, p.id AS _id
        RETURN post, word, score, _id;
        '''  %(text, id, text)

I just can't find out the problem here.

Thanks!

Can you describe what kind of operations this is meant to support? If you're looking to implement fast text search, lookup, and scoring, then there are much better tools already setup to do exactly this, such as ElasticSearch. — InverseFalcon, Aug 15 '16 at 23:31
@InverseFalcon I read about ElasticSearch and I found out that it's not what I am looking for. Thanks, man! — Paulo Fabrício, Aug 16 '16 at 12:18

InverseFalcon · Answer 1 · 2016-08-18T08:44:26.720

Well, you may have a data modeling problem here.

You're using MERGE when creating your word nodes, so if that word was added from any prior query with text, it will reuse that same node, so your more common word nodes (a, the, and, I, etc) will likely have many [:NEXT] relationships which will continue to grow with each query.

Is this how you mean this to behave, or are you only going to be asking your db questions about words used in only the given text in the query?

EDIT

The problem is the merging of the :Word nodes. This will match on any prior :Word node created from any previous query, and will be matched to from any future query. It's not enough to merge the :Word node itself; to make your words local only to each associated post, you have to merge the relationship of the word from your post at the same time.

We can also clean up the patterns used to match to calculate the word score, as all we need is the number of [:NEXT] relationships of any direction from each word.

    query = '''
    WITH split("%s"," ") as words 
    MERGE (p:Post {id: '%s', text: '%s'})
    WITH p, words
    UNWIND range(0,size(words)-2) as idx
    MERGE (p)-[:CONTAINS]->(w1:Word {name:words[idx]})
    MERGE (p)-[:CONTAINS]->(w2:Word {name:words[idx+1]})
    MERGE (w1)-[:NEXT]->(w2)
    WITH p
    MATCH (p)-[:CONTAINS]->(w:Word)
    WITH size( ()-[:NEXT]-(w) ) AS score, w.name AS word, p.text AS post, p.id AS _id
    RETURN post, word, score, _id;
    '''  %(text, id, text)

I just want the words used in the given text. Thank you very much — Paulo Fabrício, Aug 16 '16 at 12:04
Thanks for clarifying. I've updated my answer. It should keep merged words local to each associated post, and should streamline score calculation. — InverseFalcon, Aug 18 '16 at 08:45
Thank u very much @InverseFalcon. I got another task for now and when I finish it, I'll go back to this one. I will test it later — Paulo Fabrício, Aug 18 '16 at 16:59

score 0 · Accepted Answer · answered Aug 15 '18 at 21:41

My solution is:

query = '''
    WITH split("%s"," ") AS words 
    MERGE (p:Post {id: "%s", text:"%s"})
    WITH p, words 
    UNWIND range(0,size(words)-2) as idx
    MERGE (w1:Word {name:words[idx]})
    MERGE (w2:Word {name:words[idx+1]})
    MERGE (w1)-[n:NEXT]->(w2)
    ON MATCH SET n.count = n.count + 1
    ON CREATE SET n.count = 1
    MERGE (p)-[:CONTAINS]->(w2)
    MERGE (p)-[:CONTAINS]->(w1)
    '''  %(text, id, text)

Cypher query return undesirable result

2 Answers2