0

How can I remove duplicate vertices/edges in an Apache AGE graph database?

For example, if there are two vertices labeled "User" with the same "name" property value, how can I delete one of them?

I want the query to be applicable to all types of vertices/edges.

Omar Saad
  • 349
  • 3
  • 8

6 Answers6

2

In order to drop duplicate vertices, you can formulate a query as follows:

SELECT * FROM cypher('graph', $$ 
MATCH (u:User {name: 'user'}) WITH u SKIP 1 DELETE u
RETURN u $$) AS (u agtype);

Here SKIP 1 allows us to skip/leave the first row (in other words the first vertex found) and then we apply DELETE on the remaining vertices.

In the same way, in order to drop duplicate edges, use:

SELECT * FROM cypher('graph', $$ 
MATCH (u:User {name: 'user'})-[e:EDGE]->(v: User1 {name: 'user1'}) WITH e SKIP 1 DELETE e
RETURN e $$) AS (e agtype);
Zainab Saad
  • 728
  • 1
  • 2
  • 8
0

You can use cypher queries to delete the nodes/edges.

Another way could be to try merging the nodes and then removing the data from the merged node you do not require.

Let me know if it works for you.

0

I dont think there is a way to match every duplicate and then delete them, you have to do it manually each dupliate. For example :

postgres=# SELECT * FROM cypher('test_graph', $$ 
CREATE (u:user {name: 'user'}) 
RETURN u $$) AS (u agtype);
                                         u                                         
-----------------------------------------------------------------------------------
 {"id": 1125899906842625, "label": "user", "properties": {"name": "user"}}::vertex
(1 row)

postgres=# SELECT * FROM cypher('test_graph', $$ 
CREATE (u:user {name: 'user'}) 
RETURN u $$) AS (u agtype);
                                         u                                         
-----------------------------------------------------------------------------------
 {"id": 1125899906842626, "label": "user", "properties": {"name": "user"}}::vertex
(1 row)

here we create to identical vertices but as you can see the ID is different so when we want to delete one of them we use :

SELECT * FROM cypher('test_graph', $$ 
MATCH (u) 
WHERE id = 1125899906842625 
DELETE u $$) AS (u agtype);

and with this we delete the specific vertex (or edge) that we want.

0

To carry out this, the distinctive property which is the ID automatically generated by AGE is important. The following query should work for not just one case but several.

SELECT * 
FROM cypher('graph_name', $$
    MATCH (v:User)
    WHERE NOT id = VERTEX_ID
    DETACH DELETE v
$$) as (v agtype);

To avoid such duplicates, the MERGE clause should be utilised.

According to this, "MERGE does a "select-or-insert" operation that first checks if the data exists in the database. If it exists, then Cypher returns it as is or makes any updates you specify on the existing node or relationship. If the data does not exist, then Cypher will create it with the information you specify."

Tito
  • 289
  • 8
0
  1. First you need to identify the duplicates. and for this task Use this:
MATCH (u:User)

WITH u.name AS name, collect(u) AS duplicates

WHERE size(duplicates) > 1

RETURN duplicates
  1. Secondly you need to decide which vertices or edges need to be kept.

  2. Now delete the vertices/edges, by doing some modification to the previous query.

     MATCH (u:User)
    
     WITH u.name AS name,  collect(u) AS duplicates
    
     WHERE size(duplicates) > 1
    
     WITH duplicates, max(id(u)) AS keepId
    
     UNWIND duplicates AS duplicate
    
     WHERE id(duplicate) <> keepId
    
     DELETE duplicate
    
0

To find duplicate vertices/edges depending on the property value, combine the MATCH clause with the GROUP BY and HAVING clauses.

MATCH (u:User)
WITH u.name AS name, COUNT(u) AS count
WHERE count > 1
RETURN name, count

Once you have located the duplicates, you can use the DELETE clause to eliminate them and the MATCH clause to pick the duplicate vertices and edges.

MATCH (u:User {name: 'dup_name'})  
WITH u LIMIT 1
DELETE u

The LIMIT 1 clause ensures that only one of the duplicate vertices is deleted.