3

I would like to import a very simple directed graph file in csv to OrientDB. Concretely, the file is the roadNet-PA dataset from the SNAP collection https://snap.stanford.edu/data/roadNet-PA.html. The first lines of the file are as follows:

# Directed graph (each unordered pair of nodes is saved once)
# Pennsylvania road network
# Nodes: 1088092 Edges: 3083796
# FromNodeId    ToNodeId
0       1
0       6309
0       6353
1       0
6353    0
6353    6354

There is only one type of vertex (a road intersection) and edges have no information (I suppose OrientDB lightweight edges are the best option for this). Note also that vertices are spaced with tabs.

I've tried to create a simple etl to import the file with no success. Here is the etl:

{
  "config": {
    "log": "debug"
  },
  "source" : {
    "file": { "path": "/tmp/roadNet-PA.csv" }
  },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": { "separator": "   ", "skipFrom": 1, "skipTo": 4 } },
    { "vertex": { "class": "Intersection" } },
    { "edge": { "class": "Road" } }
  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:localhost/roads",
       "dbType": "graph",
       "classes": [
         {"name": "Intersection", "extends": "V"},
         {"name": "Road", "extends": "E"}
       ], "indexes": [
         {"class":"Intersection", "fields":["id:integer"], "type":"UNIQUE" }
       ]
    }
  }
} 

The etl works but it does not import the file as I expect. I suppose the problem is in the transformers. My idea is to read the csv line by line and create and edge connecting both vertices, but I'm not sure how to express this in an etl file. Any ideas?

Community
  • 1
  • 1
Pablo R. Mier
  • 719
  • 1
  • 7
  • 13

2 Answers2

1

Try this:

{
  "config": {
    "log": "debug"
  },
  "source" : {
    "file": { "path": "/tmp/roadNet-PA.csv" }
  },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": { "separator": "\t", "skipFrom": 1, "skipTo": 4,
               "columnsOnFirstLine": false, 
               "columns":["id", "to"] } },
    { "vertex": { "class": "Intersection" } },
    { "merge": { "joinFieldName":"id", "lookup":"Intersection.id" } },
    { "edge": {
       "class": "Road",
       "joinFieldName": "to",
       "lookup": "Intersection.id",
       "unresolvedLinkAction": "CREATE"
      }
    },
  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:localhost/roads",
       "dbType": "graph",
       "wal": false,
       "batchCommit": 1000,
       "tx": true,
       "txUseLog": false,
       "useLightweightEdges" : true,
       "classes": [
         {"name": "Intersection", "extends": "V"},
         {"name": "Road", "extends": "E"}
       ], "indexes": [
         {"class":"Intersection", "fields":["id:integer"], "type":"UNIQUE" }
       ]
    }
  }
} 

To speedup loading I suggest you to shutdown the server, and import the ETL by using "plocal:" instead of "remote:". Example replacing the existent with:

       "dbURL": "plocal:/orientdb/databases/roads",
Lvca
  • 8,938
  • 2
  • 24
  • 25
  • Thanks for the answer. I'm not sure if I did something wrong but I detected two errors. First, the skipFrom and skipTo config is not working as the first lines are passed to the transformer. I've removed those lines by hand and I've found a second problem: OrientVertex cannot be casted to ODocument. Here is the log http://pastebin.com/i6QGRcUV – Pablo R. Mier Jul 10 '15 at 07:20
  • 1
    Try moving the merge before the vertex – Lvca Jul 10 '15 at 14:50
1

It finally worked. I've moved the merge before vertex line as suggested by Luca. I've also changed the 'id' field to 'from' to avoid the error "property key is reserved for all elements id". Here is the snippet:

{
  "config": {
    "log": "debug"
  },
  "source" : {
    "file": { "path": "/tmp/roads.csv" }
  },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": { "separator": "\t",
               "columnsOnFirstLine": false, 
               "columns":["from", "to"] } },
    { "merge": { "joinFieldName":"from", "lookup":"Intersection.from" } },
    { "vertex": { "class": "Intersection" } },
    { "edge": {
       "class": "Road",
       "joinFieldName": "to",
       "lookup": "Intersection.from",
       "unresolvedLinkAction": "CREATE"
      }
    },
  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:localhost/roads",
       "dbType": "graph",
       "wal": false,
       "batchCommit": 1000,
       "tx": true,
       "txUseLog": false,
       "useLightweightEdges" : true,
       "classes": [
         {"name": "Intersection", "extends": "V"},
         {"name": "Road", "extends": "E"}
       ], "indexes": [
         {"class":"Intersection", "fields":["from:integer"], "type":"UNIQUE" }
       ]
    }
  }
} 
Pablo R. Mier
  • 719
  • 1
  • 7
  • 13