2

I have an CSV which contains an edge list, one edge per row. It looks like this:

id1, id2, attr1, attr2, attrX, attrY, attrZ

From this, I want to be able to create (or update) the following, per row:

Vertex A of class X, with id1 and attribute attr1 Vertex B of class X, with id2 and attribute attr2 Edge A->B with edge attributes attrX, attrY, attrZ

This is the configuration file I'm feeding to oetl.sh (using OrientDB 2.2 beta2):

{
  "source": { "file": { "path": "/data/sample/test.csv" } },
  "extractor": { "row": {} },
  "transformers" :
  [
  { "csv" : {} },
  { "merge" : { "joinFieldName":"id1", "lookup":"X.id" } },
  { "vertex" : { "class" : "X", "skipDuplicates":true } },
  { "edge" : {
      "unresolvedLinkAction" : "WARNING",
      "class" : "EdgeTypeClass",
      "joinFieldName" : "id2",
      "lookup": "X.id",
      "edgeFields":{"attrX":"${input.attrX}", "attrY":"${input.attrY}","attrZ":"${input.attrZ}"}
   }
  },
  { "field" : { "fieldNames" : [ "id1", "id2", "attr1", "attr2", "attrX", "attrY", "attrZ" ], "operation": "remove" } }
],
"loader": {
  "orientdb": {
      "dbURL": "remote:localhost/test2",
      "dbType": "graph"
    }
  }
}

The sample data I used to run the test is as follows:

10,11,"A","B",100,200,1
11,12,"B","C",110,201,5
12,14,"C","D",90,250,10
14,13,"D","E",105,210,3

When I run the oetl.sh script with the given configuration and sample data, it creates 4 vertices instead of 5 and no edges. There are no attributes on the vertices at all.

So these are the questions:

  • Is there a way in the vertex clause to specify vertex attributes/fields the same way that one can do for edges (i.e. edgeFields)? The documentation doesn't mention anything about it but it seems odd that you wouldn't be able to do it.

  • Rather than relying on the edge to create the outbound vertex, should I instead be creating two vertices explicitly and if so how do I specify that in the configuration file? When I try to add two "vertex" clauses it only seems to pick up the last one as the "current" vertex.

  • It's possible that the specific edge (id1 -> id2) already exists. Is it possible to only update the edge attributes in this case?

My sinking feeling is that given the complexity and number of things I'm trying to pack into this that it will be simpler to write my own ETL (e.g. using the Java API) instead of relying on oetl, but I was hoping I'd be able to avoid doing that if only because it's more maintainable.

Michela Bonizzi
  • 2,622
  • 1
  • 9
  • 16
G. Hirpara
  • 21
  • 2
  • Hi, the last version (2.2.3) is now available, and since the version 2.2.0 there is now a new features called Teleporter that replace ETL, why don't you try it? – Michela Bonizzi Jul 08 '16 at 13:59
  • Hi, thanks for the information. I did look at the documentation for Teleporter and wasn't sure it would work for me because it doesn't appear to be for handling CSV files and because it's only in the Enterprise Edition. If this is not the case I'd be glad to look at it again. – G. Hirpara Jul 08 '16 at 14:16

0 Answers0