1

I have the following csv files:

one is with the person and the other one is with the addresses and one with person address connection (one row on each file plus header). For testing purpose at first run I have:

config create_schema: true, load_new: true, load_threads: 3

The import is a success with the vertices and edges. (two vertices and one edge between them)

Now when I run the same script(same data, same input script) but with different config

config create_schema: false, load_new: false, load_threads: 3

It seems that the nodes didn’t change but I have a duplicate edge for the nodes. (two vertices and two edges between the same nodes)

this is the code that i run:

inputfiledir = 'data/'
personInput = File.csv(inputfiledir + 'sna_person_test.csv').delimiter(',')
addressInput = File.csv(inputfiledir + 'sna_address_test.csv').delimiter(',')
personAddressInput = File.csv(inputfiledir + 'san_person_address_test.csv').delimiter(',')

load(personInput).asVertices {
    label "person"
    key "id"
}

load(addressInput).asVertices {
    label "address"
    key "id"
}

load(personAddressInput).asEdges {
    label "has_address"
    outV "person_id", {
        label "person"
        key "id"
    }
    inV "address_id", {
        label "address"
        key "id"
    }
}

Is there a way to avoid this ?

Thanks

CristiC
  • 192
  • 1
  • 2
  • 12

1 Answers1

1

This is due to edges not having an Id, which leads to Graph Loader not having a way to determine if an edge is in fact a duplicate. This will cause subsequent loads to duplicate the edges, but not the vertices.

peytoncas
  • 755
  • 3
  • 9