Neptune has a bulk loader feature [1] that specifically integrates with just S3. It requires specific file formats - a specific CSV format for loading Property Graph data (data that would be queried via Gremlin or openCypher) and 4 other specific RDF formats for RDF data.
Depending on the size of the data, you may want to look at converting it into the CSV format and loading it into S3 to use Neptune's bulk loader feature versus attempting to create your own loading mechanism. The bulk loader will load the data with concurrency and can take advantage of all of the resources available on a given Neptune writer instance.
Loading via a Gremlin query maybe fine for smaller datasets or incremental loads. Just realize that all Gremlin write queries are single threaded, so unless you build concurrency into your write logic, you're write queries will only consume the resources of a single vCPU. There's some best practices related to streaming data into Neptune that covers how to scale writes with concurrency [2].
Other alternatives for loading data into Neptune via Gremlin include:
a. The AWS SDK for Pandas - you can read your data into a Pandas Dataframe and directly push the Dataframe to Neptune. [3]
b. Using the Gremlin io()
step. With this method, your files would need to be accessible via a web server (http). So you would have to mount your EFS filesystem to an EC2 instance and expose the EFS mount/path behind a web server (Apache web server, NGINX, etc.) running on that instance. You could then issue commands like:
g.io('https://<my_web_server>/<my_file>').read()
This last method works with files already formatted in GraphML or GraphSON formats.
[1] https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html
[2] https://aws-samples.github.io/aws-dbs-refarch-graph/src/writing-from-amazon-kinesis-data-streams/
[3] https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/033%20-%20Amazon%20Neptune.ipynb