0
  1. I want to delete the vertex to loop on one dataframe.
  2. Suppose I will delete the vertex based on some cols of dataframe my function is written in this way: and it is timeout
    def delete_vertices_for_label(rows):
        conn = self.remote_connection()
        g = self.traversal_source(conn)
        for row in rows:
            entries = row.asDict()
            create_traversal = __.hasLabel(str(entries["~label"]))
            for key, value in entries.iteritems():
                if key=='~id':
                    pass
                elif key == '~label':
                    pass
                else:
                    create_traversal.has(key), value)
            g.V().coalesce(create_traversal).drop().iterate()

I have succeed in using this function locally on tinkerGraph, however ,when I try to run above function in glue which manipulate data in aws Neptune ; it failed. I also create one lambda function in below: still meet the issue like timeout.

     def run_sample_gremlin_basedon_property():
        remoteConn = DriverRemoteConnection('ws://' + CLUSTER_ENDPOINT + ":" + 
        CLUSTER_PORT + '/gremlin', 'g')
        graph = Graph()
        g = graph.traversal().withRemote(remoteConn)
        create_traversal = __.hasLabel("Media")
        create_traversal.has("Media_ID", "99999")
        create_traversal.has("src_name", "NET")
        print ("create_traversal:",create_traversal)
        g.V().coalesce(create_traversal).drop().iterate()


John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Emma Y
  • 555
  • 1
  • 9
  • 16
  • Is your issue that drop APIs time out? Or is it a broader scope around how you are using Glue/Lambda etc? If its related to drop timeouts, do ensure that you are on a newer version than this: https://docs.aws.amazon.com/neptune/latest/userguide/engine-releases-1.0.1.0.200296.0.html – The-Big-K May 25 '19 at 07:28
  • EmmaYang - Do you have any updates you'd like to share on this? – The-Big-K Jun 18 '19 at 06:18
  • @KarthikRajan Thank you. Sorry for the very late reply. I got chance to talk to AWS support team,they mentioned that it is probably due to while I drop the vertices the same time, it will search for the edges as well . due to I use the glue job to set about 3000 partitions, which means to delete the vertex at the same time, that is why meet the issue of timeout. – Emma Y Jul 02 '19 at 20:06
  • @KarthikRajan I have another questions: when I try to use CSV bulk load to neptune, it give me error "Edges Single Cardinality violation", but the neptune document said the default it is "set" instead of "single". so is the SET only default for vertices, but for the edge, it is "SIngle" ? Thank you – Emma Y Jul 02 '19 at 20:08
  • Would like to scope down the discussion to this specific question, so please open a separate question with more details for that issue. For this specific question, I'll try to summarize what you mentioned, do correct me if I capture it incorrectly. – The-Big-K Jul 02 '19 at 20:22

1 Answers1

0

Dropping a vertex involves dropping associated properties and edges as well, and hence depending on the data, it could take a large amount of time. Drop step was optimized in one of the engine releases [1], so ensure that you are on a version newer than that. If you still get timeouts, set an appropriate timeout value on the cluster using the cluster parameter for timeouts.

Note: This answer is based off EmmaYang's communication with AWS Support. Looks like the Gluejob was configured in a manner that needs a high timeout. I'm not familiar enough with Glue to comment more on that (Emma - Can you please elaborate that?)

[1] https://docs.aws.amazon.com/neptune/latest/userguide/engine-releases-1.0.1.0.200296.0.html

The-Big-K
  • 2,672
  • 16
  • 35
  • I am not inot the glue very deeply regarding to the connection with Neptune. for example: if I repartition the dataframe into 60 partitions, and will it affect when connecting with neptune ? will it build more connection with neptun compare if I re-partition the dataframe as 20 ? the codes is smiliar as here:https://github.com/awslabs/amazon-neptune-tools/tree/master/glue-neptune – Emma Y Jul 02 '19 at 23:49