2

I'm interested in loading some data into an OrientDB from some CSV files that contain spatial coordinates in WGS84 Lat/Long.

I'm using OrientDB 2.2.8 and have the lucene spatial module added to my $ORIENTDB_HOME/lib directory.

I'm loading my data into a database using ETL and would like to add the spatial index but I'm not sure how to do this.

Say my CSV file has the following columns:

  • Label (string)
  • Latitude (float)
  • Longitude (float)

I've tried this in my ETL:

"loader": {
    "orientdb": {
        "dbURL": "plocal:myDatabase.orientdb",
        "dbType": "graph",
        "batchCommit": 1000,
        "classes": [ { "name": "vertex", "extends", "V" } ],
        "indexes": [ { "class": "vertex", "fields":["Label:string"], "type":"UNIQUE" },
                     { "class": "Label", "fields":["Latitude:float","Longitude:float"], "type":"SPATIAL" }
                   ]
    }
}

but it's not working. I get the following error message:

ETL process has problem: com.orientechnologies.orient.core.index.OIndexException: Index with type SPATIAL and algorithm null does not exist.

Has anyone looked into creating spatial indices via ETL? Most of the stuff I'm seeing on this is using either Java or via direct query.

Thanks in advance for any advice.

Community
  • 1
  • 1
TxAG98
  • 1,070
  • 2
  • 10
  • 25

1 Answers1

0

I was able to get it to load using the legacy spatial capabilities.

I put together a cheezy dataset that has some coordinates for a few of the Nazca line geoglyphs:

Name,Latitude,Longitude
Hummingbird,-14.692131,-75.148892
Monkey,-14.7067274,-75.1475391
Condor,-14.6983457,-75.1283374
Spider,-14.694363,-75.1235815
Spiral,-14.688309,-75.122757
Hands,-14.694459,-75.113881
Tree,-14.693897,-75.114467
Astronaut,-14.745222,-75.079755
Dog,-14.706401,-75.130788

I used a script to create my GeoGlyph class, createVertexGeoGlyph.osql:

set echo true
connect PLOCAL:./nazca.orientdb admin admin
CREATE CLASS GeoGlyph EXTENDS V CLUSTERS 1
CREATE PROPERTY GeoGlyph.Name      STRING
CREATE PROPERTY GeoGlyph.Latitude  FLOAT
CREATE PROPERTY GeoGlyph.Longitude FLOAT
CREATE PROPERTY GeoGlyph.Tag       EMBEDDEDSET STRING
CREATE INDEX GeoGlyph.index.Location ON GeoGlyph(Latitude,Longitude) SPATIAL ENGINE LUCENE

which I load into my database using

$ console.sh createVertexGeoGlyph.osql

I do it this way because it seems to work more consistently for me. I've had some difficulties with getting the ETL engine to create defined properties when I've wanted it to off CSV imports. Sometimes it wants to cooperate and create my properties and other times has trouble.

So, the next step to get the data in is to create my .json files for the ETL process. I like to make two, one that is file-specific and another that is a common file since often I have datasets that span multiple files.

First, I have a my nazca_liens.json file:

{
    "config": {
        "log": "info",
        "fileDirectory": "./",
        "fileName": "nazca_lines.csv"
    }
}

Next is the commonGeoGlyph.json file:

{
    "begin": [
               { "let": { "name": "$filePath",  "expression": "$fileDirectory.append($fileName )" } },
             ],
    "config": { "log": "debug" },
    "source": { "file": { "path": "$filePath" } },
    "extractor":
        {
        "csv": { "ignoreEmptyLines": true,
                 "nullValue": "N/A",
                 "separator": ",",
                 "columnsOnFirstLine": true,
                 "dateFormat": "yyyy-MM-dd"
               }
        },
    "transformers": [
            { "vertex": { "class": "GeoGlyph" } },
            { "code":   { "language":"Javascript",
                          "code": "print('>>> Current record: ' + record); record;" }
            }
        ],
    "loader": {
        "orientdb": {
            "dbURL": "plocal:nazca.orientdb",
            "dbType": "graph",
            "batchCommit": 1000,
            "classes": [],
            "indexes": []
        }
    }
}

There's more stuff in the file than is necessary, I use it as a template for a lot of stuff. In this case, I don't have to create my index in the ETL file itself because I already created it in the createVertexGeoGlyph.osql file.

To load the data I just use the oetl.sh script:

$ oetl.sh commonGeoGlyph.json nazca_lines.json

This is what's working for me... I'm sure there are better ways to do it, but this works. I'm posting this here to tie off the question. Hopefully someone will find this to be useful.

TxAG98
  • 1,070
  • 2
  • 10
  • 25