1

Our system creates OrientDB databases programmatically and uses one database for each customer (before anyone jump on dismissing this design, the reasons are security, possibility to move certain customer/data between datacenters/regions and the possibility to relocation to on-premise).

This works great in OrientDB in single mode. However, when the database is setup in distributed mode (3 servers, on amazon). The behaviour is, to put it mildly, weird. I know the docs doesn't say anything about this being supported, but I couldn't find anything that says it doesn't either.

Sometimes the database is created fine, but the client locks indefinitely (in OAdaptiveLock.lock()). Sometimes the whole cluster needs to be restarted to be able to use the database and sometimes, as it is as the time of writing, one OrientDB node shuts down by itself after it seems to be synching with the others (Address[1.2.3.4]:2434 is SHUTTING_DOWN [LifecycleService] -> Terminating forcefully... [Node]). The error message is proceeded by a stacktrace (see below).

So, to my questions:

  1. Do OrientDB support database creations online in distributed mode?
  2. If so, what can I be doing wrong?
  3. If not, is there any plans on supporting this in the future?

Thanks in advance!

./Anders

Stacktrace:

2016-01-28 14:00:01:395 SEVER [infogile02] error on creating cluster 'superclassesedge_infogile02' in class 'superClassesEdge':  [OHazelcastPlugin][infogile02] Error on starting distributed plugin
com.orientechnologies.orient.server.distributed.ODistributedException: com.orientechnologies.orient.server.distributed.ODistributedException: Error on creating cluster 'superclassesedge_infogile02' in class 'superClassesEdge'
    at com.orientechnologies.orient.server.hazelcast.OHazelcastDistributedDatabase.configureDatabase(OHazelcastDistributedDatabase.java:241)
    at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.installDatabaseFromNetwork(OHazelcastPlugin.java:1131)
    at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.requestDatabase(OHazelcastPlugin.java:971)
    at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.installDatabase(OHazelcastPlugin.java:908)
    at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.installNewDatabases(OHazelcastPlugin.java:1468)
    at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.startup(OHazelcastPlugin.java:185)
    at com.orientechnologies.orient.server.OServer.registerPlugins(OServer.java:979)
    at com.orientechnologies.orient.server.OServer.activate(OServer.java:346)
    at com.orientechnologies.orient.server.OServerMain.main(OServerMain.java:41)
Caused by: com.orientechnologies.orient.server.distributed.ODistributedException: Error on creating cluster 'superclassesedge_infogile02' in class 'superClassesEdge'
    at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.installLocalClusterPerClass(OHazelcastPlugin.java:1631)
    at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.installDbClustersForLocalNode(OHazelcastPlugin.java:1300)
    at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin$2.call(OHazelcastPlugin.java:1134)
    at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin$2.call(OHazelcastPlugin.java:1131)
    at com.orientechnologies.orient.server.hazelcast.OHazelcastDistributedDatabase.configureDatabase(OHazelcastDistributedDatabase.java:239)
    ... 8 more
Caused by: com.orientechnologies.orient.core.exception.ODatabaseException: Error on saving record #0:1
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeSaveRecord(ODatabaseDocumentTx.java:2044)
    at com.orientechnologies.orient.core.tx.OTransactionNoTx.saveRecord(OTransactionNoTx.java:159)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.save(ODatabaseDocumentTx.java:2568)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.save(ODatabaseDocumentTx.java:121)
    at com.orientechnologies.orient.core.record.impl.ODocument.save(ODocument.java:1768)
    at com.orientechnologies.orient.core.record.impl.ODocument.save(ODocument.java:1764)
    at com.orientechnologies.orient.core.metadata.schema.OSchemaShared$1.call(OSchemaShared.java:1213)
    at com.orientechnologies.orient.core.db.OScenarioThreadLocal.executeAsDistributed(OScenarioThreadLocal.java:71)
    at com.orientechnologies.orient.core.metadata.schema.OSchemaShared.saveInternal(OSchemaShared.java:1208)
    at com.orientechnologies.orient.core.metadata.schema.OSchemaShared.releaseSchemaWriteLock(OSchemaShared.java:642)
    at com.orientechnologies.orient.core.metadata.schema.OClassImpl.releaseSchemaWriteLock(OClassImpl.java:1824)
    at com.orientechnologies.orient.core.metadata.schema.OClassImpl.releaseSchemaWriteLock(OClassImpl.java:1819)
    at com.orientechnologies.orient.core.metadata.schema.OClassImpl.addCluster(OClassImpl.java:1088)
    at com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.installLocalClusterPerClass(OHazelcastPlugin.java:1624)
    ... 12 more
Caused by: java.lang.NullPointerException
    at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.endAtomicOperation(OAtomicOperationsManager.java:148)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.doUpdateRecord(OAbstractPaginatedStorage.java:2046)
    at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.updateRecord(OAbstractPaginatedStorage.java:971)
    at com.orientechnologies.orient.server.distributed.ODistributedStorage.updateRecord(ODistributedStorage.java:708)
    at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.executeSaveRecord(ODatabaseDocumentTx.java:2005)
    ... 25 more

2016-01-28 14:00:01:398 INFO  [10.0.9.105]:2434 [orientdb] [3.5.3] Address[10.0.9.105]:2434 is SHUTTING_DOWN [LifecycleService]
2016-01-28 14:00:01:398 WARNI [10.0.9.105]:2434 [orientdb] [3.5.3] Terminating forcefully... [Node]
2016-01-28 14:00:01:399 INFO  [10.0.9.105]:2434 [orientdb] [3.5.3] Shutting down connection manager... [Node]
Heintz
  • 41
  • 7
  • I think I had the same issues. I still can't figure out. – Pitipong Guntawong Jan 29 '16 at 06:45
  • On my behalf I think this was a serious case of tl;dr, the distributed architecture doc (http://orientdb.com/docs/2.0/orientdb.wiki/Distributed-Architecture.html) states that "creation of a database on multiple nodes could cause synchronization problems when clusters are automatically created. Please create the databases before to run in distributed mode". – Heintz Jan 31 '16 at 09:17
  • I figured out the issue but not sure what is the cause. basically I try to run orientdb in docker with distribute but it didn't seem to work. so after I try by fresh orientdb in ubuntu and it did work (by copy the folder from another server) – Pitipong Guntawong Mar 17 '16 at 10:09
  • AWS's biggest draw for us is that we can auto scale our back-end architecture when needed. The biggest feature that got us interested in OrientDB, despite it's other faults, is that it is distributed, replicated and supports AWS. It's a shame that those features don't work well together under auto-scaling. This may be more an issue with hazelcast than OrientDB. – anber Aug 24 '16 at 07:08

1 Answers1

1

Severe case of tl;dr on by behalf. Docs on distributed architecture in Orientdb clearly states "creation of a database on multiple nodes could cause synchronization problems when clusters are automatically created. Please create the databases before to run in distributed mode" but I didn't read that far.

By the docs, the suggested solution seems to be "Partitioned Graphs" (described here http://orientdb.com/docs/2.0/orientdb.wiki/Partitioned-Graphs.html). That solution doesn't really address all our concerns, but is in theory good enough.

However, practically that doesn't work, it requires a significant rewrite since the transactions needs to be managed differently. More on that in another topic....

Heintz
  • 41
  • 7
  • We jumped through so many hoops to get hazelcast to run on AWS and then we discovered this issue. Why on earth would you configure aws in hazelcast.xml if auto-scaling is not supported by OrienDB? If you have to setup your servers before hand, then it's far easier to use tcp-ip for a discovery strategy. – anber Aug 24 '16 at 07:27