I have an CSV which contains an edge list, one edge per row. It looks like this:
id1
, id2
, attr1
, attr2
, attrX
, attrY
, attrZ
From this, I want to be able to create (or update) the following, per row:
Vertex A
of class X
, with id1
and attribute attr1
Vertex B
of class X
, with id2
and attribute attr2
Edge A->B
with edge attributes attrX
, attrY
, attrZ
This is the configuration file I'm feeding to oetl.sh
(using OrientDB 2.2 beta2):
{
"source": { "file": { "path": "/data/sample/test.csv" } },
"extractor": { "row": {} },
"transformers" :
[
{ "csv" : {} },
{ "merge" : { "joinFieldName":"id1", "lookup":"X.id" } },
{ "vertex" : { "class" : "X", "skipDuplicates":true } },
{ "edge" : {
"unresolvedLinkAction" : "WARNING",
"class" : "EdgeTypeClass",
"joinFieldName" : "id2",
"lookup": "X.id",
"edgeFields":{"attrX":"${input.attrX}", "attrY":"${input.attrY}","attrZ":"${input.attrZ}"}
}
},
{ "field" : { "fieldNames" : [ "id1", "id2", "attr1", "attr2", "attrX", "attrY", "attrZ" ], "operation": "remove" } }
],
"loader": {
"orientdb": {
"dbURL": "remote:localhost/test2",
"dbType": "graph"
}
}
}
The sample data I used to run the test is as follows:
10,11,"A","B",100,200,1
11,12,"B","C",110,201,5
12,14,"C","D",90,250,10
14,13,"D","E",105,210,3
When I run the oetl.sh script with the given configuration and sample data, it creates 4 vertices instead of 5 and no edges. There are no attributes on the vertices at all.
So these are the questions:
Is there a way in the vertex clause to specify vertex attributes/fields the same way that one can do for edges (i.e. edgeFields)? The documentation doesn't mention anything about it but it seems odd that you wouldn't be able to do it.
Rather than relying on the edge to create the outbound vertex, should I instead be creating two vertices explicitly and if so how do I specify that in the configuration file? When I try to add two "vertex" clauses it only seems to pick up the last one as the "current" vertex.
It's possible that the specific edge (id1 -> id2) already exists. Is it possible to only update the edge attributes in this case?
My sinking feeling is that given the complexity and number of things I'm trying to pack into this that it will be simpler to write my own ETL (e.g. using the Java API) instead of relying on oetl, but I was hoping I'd be able to avoid doing that if only because it's more maintainable.