Neo4j: Merge duplicating one of three nodes in a query

Question

As I stated on a different post, I am moving from SQL Server to Neo4j, so I'm fighting the learning curve. I've been doing fairly well at searching StackOverflow and google to answer my questions, but I have stumbled across a weird result of a query that doesn't make sense.

C# Code:

public void AddMarketInfo(MarketInfo mi)
{
    Bid bid = mi.Bid;
    Ask ask = mi.Ask;

    var query = clientConnection.Cypher
        .Merge("(newbid:Bid { ID: {bID} })")
        .OnCreate()
        .Set("newbid = {bid}")
            .WithParams(new
            {
                bID = bid.ID,
                bid
            })
        .Merge("(newask:Ask { ID: {aID} })")
        .OnCreate()
        .Set("newask = {ask}")
            .WithParams(new
            {
                aID = ask.ID,
                ask
            })
        .Merge("(newMarketInfo:MarketInfo { ID: {id}, ASK: {askID}, BID: {bidID} })")
        .OnCreate()
        .Set("newMarketInfo = {mi}")
            .WithParams(new
            {
                id = mi.ID,
                bidID = bid.ID,
                askID = ask.ID,
                mi
            })
    .CreateUnique("(newask)-[rA:Ask_Input_Data]->(newMarketInfo)")
    .CreateUnique("(newbid)-[rB:Bid_Input_Data]->(newMarketInfo)");
    query.ExecuteWithoutResults();
}

I'm currently debugging the program, so this statement is being executed on the same data multiple times. Yes, I am going into the database and deleting all nodes for now.

When creating the "Bid" node and the "Ask" node, it successfully merges with the existing node but the "MarketInfo" node is being duplicated.

Any thoughts why?

Edit 2: Modified Query

So I was doing some more reading in the neo4j documentation:

https://neo4j.com/docs/developer-manual/current/cypher/clauses/merge/#query-merge-on-create-on-match

The example they provided was:

Merge with ON CREATE and ON MATCH Merge a node and set properties if the node needs to be created.

Query.

MERGE (keanu:Person { name: 'Keanu Reeves' })
ON CREATE SET keanu.created = timestamp()
ON MATCH SET keanu.lastSeen = timestamp()
RETURN keanu.name, keanu.created, keanu.lastSeen

The query creates the 'keanu' node, and sets a timestamp on creation time. If 'keanu' had already existed, a different property would have been set.

So I modified my code to "do the same":

var query = graphClient.Cypher
            .Merge("(newbid:Bid { ID: {bID} })")
            .OnMatch()
            .Set("newbid = {bid}")
            .OnCreate()
            .Set("newbid = {bid}")
                .WithParams(new
                {
                    bID = bid.ID,
                    bid
                })

            .Merge("(newask:Ask { ID: {aID} })")
            .OnMatch()
            .Set("newask = {ask}")
            .OnCreate()
            .Set("newask = {ask}")
                .WithParams(new
                {
                    aID = ask.ID,
                    ask
                })

            .Merge("(newMarketInfo:MarketInfo { ID: {id}, ASK: {askID}, BID: {bidID} })")
            .OnCreate()
            .Set("newMarketInfo = {mi}")
                .WithParams(new
                {
                    id = mi.ID,
                    bidID = bid.ID,
                    askID = ask.ID,
                    mi
                })
            .Merge("(newask)-[rA:Ask_Input_Data]->(newMarketInfo)")
            .Merge("(newbid)-[rB:Bid_Input_Data]->(newMarketInfo)");
        query.ExecuteWithoutResultsAsync();

and yet, the nodes MarketInfo nodes are still being duplicated. I think I'm heading down the right path now, but...there is still something I'm missing.

Jeremy, sorry, I have made a mistake and deleted my previous comment. — Bruno Peres, Feb 27 '18 at 16:34
To confirm: all the properties `mi.ID`, `bid.ID` and `ask.ID` are the same for all executions, right? — Bruno Peres, Feb 27 '18 at 16:43
Also, `CREATE UNIQUE` is deprecated, as you can see [here](https://neo4j.com/docs/developer-manual/current/cypher/clauses/create-unique/). You should change it to `MERGE`. — Bruno Peres, Feb 27 '18 at 16:48
I will swap out the CREATE UNIQUE for the MERGE, but yes...I am using the same function each time to create an ID based off of data found on the MarketInfo Object. The same function is also used on the Bid and Ask objects, and those IDs are not being duplicated for sure. — Jeremy Parker, Feb 27 '18 at 22:42
@JeremyParker I am not sure I understand. Are you saying that if your code merges `(:MarketInfo { ID: 1, ASK: 2, BID: 3})` twice (with the same 3 property values), you would get a new node created each time (*which should not happen*)? Or are you saying there is some variation in those 3 values each time (*which should cause a new node to be created each time*)? — cybersam, Feb 28 '18 at 00:53
@cybersam What I'm saying is, running the code above on X data set, would create one MarketInfo Node, one Bid node and one Ask node. The Ask and Bid will be related to the MarketInfo node. If I re-run the exact same data through, there will be two MarketInfo nodes, one bid node and one Ask node...with the ask and bid nodes being related to both MarketInfo nodes. — Jeremy Parker, Feb 28 '18 at 02:03
Will do. I just ran a big set of data, so I will queue up a different db to show. — Jeremy Parker, Feb 28 '18 at 03:49

score 2 · Accepted Answer · answered Mar 30 '18 at 21:21

2

So here was the problem, and it annoys me...

You can't use "ID" as a parameter of object. You can use "Id" or "id" or "iD" but not "ID". Once I swapped over to using "Id", the node was never duplicated again.

answered Mar 30 '18 at 21:21

Jeremy Parker

75
7

John · Answer 2 · 2018-03-01T18:14:21.580

0

Based on the information provided, it sounds likely that "(newMarketInfo:MarketInfo { ID: {id}, ASK: {askID}, BID: {bidID} })" has a different ID value each time you run your code.
Another option is that the mi params value has values that are changing the node's { ID: {id}, ASK: {askID}, BID: {bidID} } values. Put another way / as an example, it might be that you are merging on { ID: {1}, ASK: {2}, BID: {3} } and then, ON CREATE, immediately setting "newMarketInfo = {mi}" which changes the node's values to { ID: {2}, ASK: {3}, BID: {4}, exampleValue: {5} }. In this scenario, the merge will always create a new node, but when you inspect the result the nodes created will look identical. Make sense?

Update

The photos you've added show that your MarketInfo nodes do not have ASK or BID properties (the properties that you are merging on). Assuming those properties should not be null. This makes me think that my second hypothesis above explains what is happening. To test this, you could try eliminating the

.OnCreate().Set("newMarketInfo = {mi}")

part of your query. In this scenario, see what node your merge persists in the database. Does the node have ASK / BID properties? If yes, then you've found your problem. You can also see if running the query twice in this scenario adds a second node or not. If not, the problem is definitely that ON CREATE clause.

edited Mar 01 '18 at 18:14

answered Feb 28 '18 at 05:08

John

9,249
5
44
76

Also, somewhat unrelated, obviously I don't know why you're adding `askID` and `bidID` properties to the`MarketInfo` node (and maybe you really want to always return those ID's to a client when pulling a `MarketInfo` node) but it LOOKS like you're adding foreign keys to the node for querying purposes. If that was the case, you very likely should eliminate those properties and replace them with a relation, Except you already _have_ the needed relation `(newask)-[rA:Ask_Input_Data]->(newMarketInfo)`, so you could just eliminate those properties. – John Feb 28 '18 at 06:57
I added a photo with the exposed data hoping that someone can see something I'm not. – Jeremy Parker Mar 01 '18 at 14:17
As to the second comment, I am adding the sub node IDs to the parent node for queries in the future. At times, I will not be returning sub nodes, but I will have the need to query the sub nodes in a future function. I felt as though this approach would allow for the most flexibility in the future while also covering my butt when it comes to regulators and clients wanting a paper trail through the nodes to verify that we are doing what we are saying we are doing. – Jeremy Parker Mar 01 '18 at 14:19
@JeremyParker I updated my answer based on the photos. I suspect that my second hypothesis explains what's happening in your code. Regarding using sub node IDs for querying (a.k.a relational database foreign key) in cypher you can do `MATCH (marketInfo:MarketInfo {ID: $marketInfoID})<-[:Ask_Input_Data]-({ID: $askId}) RETURN marketInfo` This being said, your current approach is probably more performent then this approach if you add a composite index `ON :MarketInfo(ID, ASK, BID)`. Anyway, just wanted to point out the power of relations. – John Mar 01 '18 at 18:26
I updated my query to, what I thought, match the query that neo4j shows in their documentation...but something is still off. Thoughts? – Jeremy Parker Mar 09 '18 at 19:35

Neo4j: Merge duplicating one of three nodes in a query

2 Answers2

Update