0

I'm using the bulk loader to load data from csv files on S3 into a Neptune DB cluster. The data is loaded successfully. However, when I reload the data with some of the nodes' property values modified, the new value is not replacing the old one, but rather being added to it ,making it a list of values separated by a comma. For example:

Initial values loaded:

~id,~label,ip:string,creationTime:date
2,user,"1.2.3.4",2019-02-13

If I reload this node with a different ip:

2,user,"5.6.7.8",2019-02-13

Then I run the following traversal: g.V(2).valueMap(), and getting: ip=[1.2.3.4, 5.6.7.8], creationTime=[2019-02-13]

While this behavior may be beneficial for some use-cases, it's mostly undesired. I want the new value to replace the old one. I couldn't find any reference in the documentation to the loader behavior in case of reloading nodes, and there is no relevant parameter to configure in the API request. How can I have reloaded nodes overwriting the existing ones?

Muli
  • 3
  • 3

2 Answers2

3

Update: Neptune now supports single cardinality bulk-loading. Just set

updateSingleCardinalityProperties = TRUE

SOURCE: https://docs.aws.amazon.com/neptune/latest/userguide/load-api-reference-load.html

HackSlash
  • 4,944
  • 2
  • 18
  • 44
bwmbrown
  • 31
  • 5
1

currently the Neptune bulk loader uses Set cardinality. To update an existing property the best way is to use Gremlin via the HTTP or WS endpoint.

From Gremlin you can specify that you want single cardinality (thus replacing rather than adding to the property value). An example would be

g.V('2').property(single,"ip","5.6.7.8")

Hope that helps, Kelvin

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
  • Thanks Kelvin. I also approached AWS and they confirmed that Set is currently the only cardinality configuration possible for the bulk loader. – Muli Mar 04 '19 at 07:36