I am trying to insert 624,118,983 records divided into 1000 files, it takes 35 hours to get loaded all into neptune which is very slow. I have configured db.r5.large instance with 2 instatnce. I have 1000 files stored in S3 bucket. I have one loading request pointing to S3 bucket folder which has 1000 files. when i get the load status I get below response.
{
"status" : "200 OK",
"payload" : {
"feedCount" : [
{
"LOAD_NOT_STARTED" : 640
},
{
"LOAD_IN_PROGRESS" : 1
},
{
"LOAD_COMPLETED" : 358
},
{
"LOAD_FAILED" : 1
}
],
"overallStatus" : {
"fullUri" : "s3://myntriplesfiles/ntriple-folder/",
"runNumber" : 1,
"retryNumber" : 0,
"status" : "LOAD_IN_PROGRESS",
"totalTimeSpent" : 26870,
"startTime" : 1639289761,
"totalRecords" : 224444549,
"totalDuplicates" : 17295821,
"parsingErrors" : 1,
"datatypeMismatchErrors" : 0,
"insertErrors" : 0
}
}
I see here is that LOAD_IN_PROGRESS is always 1. that means neptune is not trying to load mutiple files in parallelization. How do i tell neptune to load 1000 file in some parallelization for example parallelization factor of 10. Am i missing any configuration?
This is how I use bulk load api.
curl -X POST -H 'Content-Type: application/json' https://neptune-hostname:8182/loader -d '
{
"source" : "s3://myntriplesfiles/ntriple-folder/",
"format" : "nquads",
"iamRoleArn" : "my aws arn values goes here",
"region" : "us-east-2",
"failOnError" : "FALSE",
"parallelism" : "HIGH",
"updateSingleCardinalityProperties" : "FALSE",
"queueRequest" : "FALSE"
}'
Please advice.