0

I need to get texts and save them to Neo4j. After that, I separate each word of that text and create a [:NEXT] relationship between them indicating the word that comes after another one and a [:CONTAINS] relationship indicating that the text contains that word. Finally I try to get the word in the text that has more relations [:NEXT] but not in the whole database. Only in the given text.

Unfortunatelly I just get the sum of the whole database.

The query is:

query = '''
        WITH split("%s"," ") as words 
        MERGE (p:Post {id: '%s', text: '%s'})
        WITH p, words
        UNWIND range(0,size(words)-2) as idx
        MERGE (w1:Word {name:words[idx]})
        MERGE (w2:Word {name:words[idx+1]})
        MERGE (w1)-[:NEXT]->(w2)
        MERGE (p)-[:CONTAINS]->(w2)
        MERGE (p)-[:CONTAINS]->(w1)
        WITH p
        MATCH (p)-[c:CONTAINS]->(w:Word)
        MATCH ()-[n1:NEXT]->(:Word {name: w.name})<-[:CONTAINS]-(p)
        MATCH (p)-[:CONTAINS]-(:Word {name: w.name})-[n2:NEXT]->()
        WITH COUNT(n1) + COUNT(n2)AS score, w.name AS word, p.text AS post, p.id AS _id
        RETURN post, word, score, _id;
        '''  %(text, id, text)

I just can't find out the problem here.

Thanks!

Paulo Fabrício
  • 319
  • 3
  • 17

2 Answers2

0

Well, you may have a data modeling problem here.

You're using MERGE when creating your word nodes, so if that word was added from any prior query with text, it will reuse that same node, so your more common word nodes (a, the, and, I, etc) will likely have many [:NEXT] relationships which will continue to grow with each query.

Is this how you mean this to behave, or are you only going to be asking your db questions about words used in only the given text in the query?

EDIT

The problem is the merging of the :Word nodes. This will match on any prior :Word node created from any previous query, and will be matched to from any future query. It's not enough to merge the :Word node itself; to make your words local only to each associated post, you have to merge the relationship of the word from your post at the same time.

We can also clean up the patterns used to match to calculate the word score, as all we need is the number of [:NEXT] relationships of any direction from each word.

    query = '''
    WITH split("%s"," ") as words 
    MERGE (p:Post {id: '%s', text: '%s'})
    WITH p, words
    UNWIND range(0,size(words)-2) as idx
    MERGE (p)-[:CONTAINS]->(w1:Word {name:words[idx]})
    MERGE (p)-[:CONTAINS]->(w2:Word {name:words[idx+1]})
    MERGE (w1)-[:NEXT]->(w2)
    WITH p
    MATCH (p)-[:CONTAINS]->(w:Word)
    WITH size( ()-[:NEXT]-(w) ) AS score, w.name AS word, p.text AS post, p.id AS _id
    RETURN post, word, score, _id;
    '''  %(text, id, text)
InverseFalcon
  • 29,576
  • 4
  • 38
  • 51
0

My solution is:

query = '''
    WITH split("%s"," ") AS words 
    MERGE (p:Post {id: "%s", text:"%s"})
    WITH p, words 
    UNWIND range(0,size(words)-2) as idx
    MERGE (w1:Word {name:words[idx]})
    MERGE (w2:Word {name:words[idx+1]})
    MERGE (w1)-[n:NEXT]->(w2)
    ON MATCH SET n.count = n.count + 1
    ON CREATE SET n.count = 1
    MERGE (p)-[:CONTAINS]->(w2)
    MERGE (p)-[:CONTAINS]->(w1)
    '''  %(text, id, text)
Paulo Fabrício
  • 319
  • 3
  • 17