1

I have files stored in EFS. I want to upload the files to Neptune. I came across many references where files are uploaded to Neptune from S3. But in my use case i am having files in EFS. Is it possible to upload from efs to Neptune through python?

Does Neptune support to load files from EFS with out using S3?

I tried using gremlin library to connect to Neptune and used g.inject() method to upload the files. But it didn't worked

deepika
  • 21
  • 2
  • Where you mentioned `g.inject()` did you mean `g.io` ? Neptune supports `g.io()` for loading GraphML and/or GraphSON data. The data has to be accessible using HTTPS from a location that can be reached by the Neptune engine (e.g. HTTPS server, S3 pre-signed URL). – Kelvin Lawrence Apr 13 '23 at 14:36

1 Answers1

1

Neptune has a bulk loader feature [1] that specifically integrates with just S3. It requires specific file formats - a specific CSV format for loading Property Graph data (data that would be queried via Gremlin or openCypher) and 4 other specific RDF formats for RDF data.

Depending on the size of the data, you may want to look at converting it into the CSV format and loading it into S3 to use Neptune's bulk loader feature versus attempting to create your own loading mechanism. The bulk loader will load the data with concurrency and can take advantage of all of the resources available on a given Neptune writer instance.

Loading via a Gremlin query maybe fine for smaller datasets or incremental loads. Just realize that all Gremlin write queries are single threaded, so unless you build concurrency into your write logic, you're write queries will only consume the resources of a single vCPU. There's some best practices related to streaming data into Neptune that covers how to scale writes with concurrency [2].

Other alternatives for loading data into Neptune via Gremlin include:

a. The AWS SDK for Pandas - you can read your data into a Pandas Dataframe and directly push the Dataframe to Neptune. [3] b. Using the Gremlin io() step. With this method, your files would need to be accessible via a web server (http). So you would have to mount your EFS filesystem to an EC2 instance and expose the EFS mount/path behind a web server (Apache web server, NGINX, etc.) running on that instance. You could then issue commands like:

g.io('https://<my_web_server>/<my_file>').read()

This last method works with files already formatted in GraphML or GraphSON formats.

[1] https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html [2] https://aws-samples.github.io/aws-dbs-refarch-graph/src/writing-from-amazon-kinesis-data-streams/ [3] https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/033%20-%20Amazon%20Neptune.ipynb

Taylor Riggan
  • 1,963
  • 6
  • 12